

| Sections                                       |                                   |
|------------------------------------------------|-----------------------------------|
| Dections                                       |                                   |
| Representing and Manipulating Information . 2  | The Memory Hierarchy90            |
| 1.0.1 Assigned Reading                         | · · ·                             |
| 1.0.2 Piazza                                   |                                   |
| 1.0.3 Lectures                                 | <b>8.0.3</b> Lectures 90          |
| 1.0.4 Assignments                              | 8.0.4 Assignments 90              |
| 1.0.5 Quiz                                     |                                   |
| 1.0.6 Chapter Summary 4                        | 8.0.6 Chapter Summary 91          |
| Representing and Manipulating Information . 20 | Exceptional Flow Control 101      |
| 2.0.1 Assigned Reading 20                      | 9.0.1 Assigned Reading 101        |
| <b>2.0.2</b> Piazza                            | 9.0.2 Piazza 101                  |
| <b>2.0.3</b> Lectures                          | <b>9.0.3</b> Lectures 101         |
| <b>2.0.4</b> Assignments                       |                                   |
| <b>2.0.5</b> Quiz                              | <b>9.0.5</b> Chapter Summary 102  |
| <b>2.0.6</b> Chapter Summary 21                | Exceptional Flow Control          |
| Machine-Level Representation of Programs 29    |                                   |
| 3.0.1 Assigned Reading 29                      | 10.0.2 Piazza 108                 |
| 3.0.2 Piazza                                   | <b>10.0.3</b> Lectures 108        |
| <b>3.0.3</b> Lectures                          | <b>10.0.4</b> Assignments 108     |
| <b>3.0.4</b> Assignments                       | <b>10.0.5</b> Chapter Summary 108 |
| <b>3.0.5</b> Quiz                              | Virtual Memory112                 |
| <b>3.0.6</b> Chapter Summary 30                | 11.0.1 Assigned Reading 112       |
| Machine-Level Representation of Programs 40    | 11.0.2 Piazza                     |
| 4.0.1 Assigned Reading 40                      | 11.0.3 Lectures                   |
| 4.0.2 Piazza 40                                | 11.0.4 Assignments                |
| <b>4.0.3</b> Lectures 40                       | 11.0.5 Quiz                       |
| 4.0.4 Assignments 41                           | 11.0.6 Chapter Summary 113        |
| 4.0.5 Quiz 41                                  | Virtual Memory123                 |
| 4.0.6 Exam 41                                  | 12.0.1 Assigned Reading 123       |
| 4.0.7 Chapter Summary 41                       | 12.0.2 Piazza                     |
| Machine-Level Representation of Programs 57    | 12.0.3 Lectures                   |
| 5.0.1 Assigned Reading 57                      | 12.0.4 Assignments                |
| 5.0.2 Piazza                                   |                                   |
| <b>5.0.3</b> Lectures 57                       | <b>12.0.6</b> Exam                |
| 5.0.4 Assignments 57                           |                                   |
| 5.0.5 Quiz 57                                  | LillKillg                         |
| 5.0.6 Chapter Summary 58                       | 13.0.2 Piazza                     |
| Processor Architecture And Optimizing          | 13.0.3 Lectures                   |
| Program Performance 63                         | 13.0.4 Assignments                |
| 6.0.1 Assigned Reading 63                      | 13.0.5 Chapter Summary 130        |
| 6.0.2 Piazza 65                                | Linking                           |
| 6.0.3 Lectures 65                              |                                   |
| 6.0.4 Assignments 64                           |                                   |
| 6.0.5 Quiz 64                                  |                                   |
| 6.0.6 Chapter Summary 64                       |                                   |
| Optimizing Program Performance 76              |                                   |
| 7.0.1 Assigned Reading 76                      |                                   |
| 7.0.2 Piazza                                   |                                   |
| 7.0.3 Lectures                                 | v                                 |
| 7.0.4 Assignments                              |                                   |
| 7.0.5 Quiz                                     |                                   |
| 7.0.6 Exam                                     |                                   |
| 7.0.7 Chapter Summary 77                       |                                   |
|                                                |                                   |

Week 1: (1/15 - 1/21)

# Representing and Manipulating Information

## 1.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 1
- Computer Systems: Chapter 2.1, 2.2, 2.3

#### 1.0.2 Piazza

Piazza discussions are optional this week. (1/23/24)

#### 1.0.3 Lectures

The lecture videos for this week are:

- C++ Review  $\approx 49$  min.
- Review Of C Switch Statement  $\approx 5$  min.
- Course Overview  $\approx 27$  min.
- Bits And Bytes  $\approx 30$  min.
- Bytes, Byteorder & Strings  $\approx 30$  min.
- C String And Memory Functions  $\approx 16$  min.
- Integer Representation  $\approx 35$  min.
- Integer Arithmetic  $\approx 46$  min.
- Data Lab Orientation  $\approx 13$  min.

The lecture notes for this week are:

- Course Overview
- Bits And Bytes
- Strings and Memory Representations
- Integer Representation
- Integer Mathematical Operations And Memory Representations

## 1.0.4 Assignments

The assignment for this week is:

- Data Lab (1/30/24)
- Data Lab Extra Credit (1/30/24)
- Data Lab Interview (1/30/24)

## 1.0.5 Quiz

The quizzes for this week are:

- Chapter 1 & 2.1 Quiz Chapter 1 & 2.1 Finalized Quiz (1/23/24)
- Chapter 2.2 Quiz Chapter 2.2 Finalized Quiz (1/23/24)
- Chapter 2.3 Quiz Chapter 2.3 Finalized Quiz (1/23/24)

## 1.0.6 Chapter Summary

The first chapter that we will be reviewing this week is **Chapter 1: A Tour Of Computer Systems**. The first section from this chapter is **Section 1.1: Information Is Bits + Context**.

#### Section 1.1: Information Is Bits + Context

#### Overview

Information in computer systems is fundamentally represented as **bits** (binary digits), which are the most basic form of data in computing. These bits are typically in the form of 0s and 1s.

#### **Bits**

A single bit can represent two states, often interpreted as on/off, true/false, or 0/1. The power of bits in computing comes from their ability to be combined. Multiple bits can represent more complex information:

- 8 bits form a byte, which can represent 256 different values.
- 16 bits, 32 bits, and 64 bits (and so on) are used to represent increasingly large or precise values.

#### Context

Context is what gives bits meaning. It's the rules or conventions used to interpret a series of bits:

- In text encoding (like ASCII or Unicode), specific sequences of bits correspond to characters.
- In digital images, bits represent pixels and color values.
- In audio files, bits encode sound waves.

Without context, a string of bits is just a random sequence of 0s and 1s. With context, that same sequence can convey a letter, a color, a sound, or any other type of information.

## Importance in Computing

Understanding how bits and context work together is fundamental in computer systems. This concept is crucial in:

- Data Storage and Retrieval
- Information Transmission
- Data Encryption and Security
- Software Development and Programming

This dual nature of digital information is what allows computers to process and manipulate diverse types of data, from simple text documents to complex multimedia.

The next section that is covered in this chapter is **Section 1.2: Programs Are Translated by Other Programs Into Different Forms**.

## Section 1.2: Programs Are Translated by Other Programs Into Different Forms

#### Overview

In computer systems, programs written by humans in high-level programming languages are not directly executed by computers. Instead, they are translated into a form that the machine can understand. This translation is done by other programs.

## Compilation and Interpretation

There are two primary ways in which human-readable code is translated into machine code:

- Compilation: A compiler translates the high-level code (like C, C++) into machine code (binary code) before the program is run. This process creates an executable file.
- **Interpretation:** An interpreter translates the high-level code (like Python) into machine code on-the-fly, during program execution. This does not create a standalone executable file.

## Intermediate Languages

Some languages use a combination of both methods. Languages like Java are first compiled into an intermediate language (like Java bytecode) which is then interpreted by a virtual machine (like the Java Virtual Machine, or JVM).

## Why Translation Is Necessary

Translation makes it possible to write software in a human-readable form while still allowing the computer to execute it efficiently. It also enables the same code to be run on different types of hardware, with only the translator (compiler/interpreter) needing to be specific to the hardware.

## **Key Points in Program Translation**

- Syntax Analysis: Checking the code for grammatical correctness.
- Semantic Analysis: Ensuring that the code's meaning and logic are consistent.
- Optimization: Enhancing the code for performance improvements.
- Code Generation: Producing the final machine-readable code.

Understanding the translation process is crucial for programmers, as it affects how they write, debug, and optimize code.

The next section that is covered in this chapter is **Section 1.3:** It Pays to Understand How Compilation Systems Work.

## Section 1.3: It Pays to Understand How Compilation Systems Work

#### Overview

A compilation system translates high-level code into machine code. Understanding how this system works is crucial for programmers for several reasons.

## Benefits of Understanding Compilation Systems

- Optimized Code: Knowledge of how compilers optimize code can guide programmers in writing more efficient and effective code.
- **Debugging:** Understanding the compilation process aids in debugging, especially when dealing with low-level errors or performance issues.

• Language Features: Insight into the compilation process can help in better understanding and utilizing the features of a programming language.

## Key Components of a Compilation System

- Front End: Processes the syntax and semantics of the code, checking for errors and converting code into an intermediate representation.
- Optimizer: Improves the code's efficiency without changing its functionality.
- Back End: Translates the optimized intermediate representation into machine code specific to the target processor.

## **Cross-Compilation**

Cross-compilation involves compiling code on one machine (host) to run on a different machine (target). This is particularly important in developing software for multiple platforms or for embedded systems.

## The Impact of Compilation in Software Development

Understanding the nuances of the compilation process impacts various aspects of software development:

- Enhances the ability to write cross-platform code.
- Facilitates better use of hardware resources.
- Enables deeper understanding of language-specific behaviors and limitations.

A solid grasp of how compilation systems work pays off by enhancing code quality, performance, and portability. It is an essential aspect of computer science that bridges the gap between high-level programming and machine-level execution.

The next section that is covered in this chapter is **Section 1.4: Processors Read and Interpret Instructions Stored in Memory**.

#### Section 1.4: Processors Read and Interpret Instructions Stored in Memory

## Overview

The Central Processing Unit (CPU) is the core component of a computer that performs instructions. These instructions, along with the data they operate on, are stored in memory.

## The Role of Memory

- Memory in a computer system stores both the instructions for programs (in the form of machine code) and the data those programs manipulate.
- There are different types of memory, like RAM (Random Access Memory), where data and instructions are stored temporarily.

## Instruction Cycle

The CPU executes instructions stored in memory in a process called the instruction cycle, which consists of:

• **Fetch:** The CPU fetches an instruction from memory.

- Decode: The CPU decodes what the instruction means and what actions are required.
- Execute: The CPU carries out the instruction.
- Store: The results of the execution are written back to memory.

#### Processor Architecture

Processor architecture (like x86, ARM) defines how a processor is designed and what kind of instructions it can execute. This affects how instructions and data are represented in memory.

## Memory Addressing

- Each location in memory has an address, and instructions include references to these addresses for data retrieval and storage.
- CPUs use these addresses to access and manipulate data in memory.

## Importance in Understanding Processor-Memory Interaction

- Enables a deeper comprehension of how software translates into actions a processor can perform.
- Essential for optimizing performance and understanding hardware limitations.
- Important for low-level programming and understanding the execution environment of programs.

This section underscores the fundamental relationship between the processor and memory in executing instructions and managing data, forming the basis of computer operations.

The next section that is covered in this chapter is **Section 1.5**: Caches Matter.

## Section 1.5: Caches Matter

#### Overview

Cache memory is a type of fast, volatile memory that serves as a buffer between the processor and the main memory (RAM). It stores frequently accessed data and instructions, allowing for quicker data retrieval compared to accessing data from the main memory.

## Levels of Cache

There are typically multiple levels of cache:

- Level 1 (L1) Cache: The smallest and fastest, located directly on the processor chip.
- Level 2 (L2) Cache: Larger than L1, slightly slower, but still faster than main memory.
- Level 3 (L3) Cache: Even larger, shared among cores in multi-core processors, and slower than L1 and L2 but faster than RAM.

#### Cache Operation

The main operations of cache memory include:

- Fetching: Data and instructions are pre-fetched from main memory based on anticipated need.
- Storing: Recently or frequently accessed data is stored for quicker access.
- **Updating:** Data in the cache is updated to reflect changes made in the main memory.

#### Cache Miss and Hit

- A cache hit occurs when the data requested by the CPU is found in the cache.
- A cache miss happens when the data is not found in the cache, necessitating access to slower main memory.

## Importance of Caches in Performance

- Reduces the average time to access data from the main memory.
- Enhances overall processing speed and system performance.
- Important for applications requiring quick data retrieval and processing, like gaming and high-performance computing.

Understanding cache memory and its operation is essential for optimizing computer performance, particularly in designing software that maximizes cache efficiency.

The next section covered in this chapter is Section 1.6: Storage Devices Form a Hierarchy.

## Section 1.6: Storage Devices Form a Hierarchy

#### Overview

In computer systems, storage devices are organized in a hierarchy that ranges from the fastest and most expensive to the slowest and least expensive. This hierarchy is designed to provide a balance between performance, cost, and storage capacity.

#### Levels of Storage Hierarchy

- **Primary Storage:** Includes the CPU registers and cache memory. They are the fastest but have the least capacity and are the most expensive per unit of storage.
- **Secondary Storage:** Consists of the main memory (RAM). Slower than primary storage but faster than tertiary storage, with moderate cost and capacity.
- Tertiary Storage: Encompasses long-term storage devices like hard disk drives (HDDs), solid-state drives (SSDs), and optical discs. They are slower in data retrieval but offer high storage capacity at a lower cost.
- Off-Line Storage: Includes removable media and cloud storage. Used for backup and archiving, offering the highest capacity at the lowest cost, but with the slowest access time.

## Trade-offs in Storage Hierarchy

The storage hierarchy involves trade-offs between:

- **Speed:** Faster storage accelerates data access but is more expensive.
- Capacity: Higher capacity storage is essential for large data sets but typically has slower access speeds.
- Cost: Balancing cost against speed and capacity is a key consideration in the design of storage systems.

#### Impact on System Performance

- The choice of storage devices affects overall system performance, especially for data-intensive applications.
- Effective management of the storage hierarchy is crucial in optimizing performance and cost.

Understanding the storage hierarchy is important for making informed decisions about data storage and retrieval strategies, particularly in system design and application development.

The next section of this chapter is Section 1.7: The Operating System Manages the Hardware.

## Section 1.7: The Operating System Manages the Hardware

#### Overview

The operating system (OS) is a critical component of a computer system. It acts as a bridge between the computer's hardware and its software, managing resources and facilitating interaction.

## Key Functions of an Operating System

- Resource Management: Allocates and manages hardware resources like CPU time, memory space, and disk storage.
- Process Management: Handles the creation, scheduling, and termination of processes.
- Memory Management: Manages the allocation and deallocation of memory space for applications and processes.
- **Device Management:** Controls and coordinates the use of hardware devices like printers, disk drives, and display monitors.
- File System Management: Organizes, stores, and retrieves data on storage devices.
- Security and Access Control: Protects system resources from unauthorized access and ensures data security.

#### Types of Operating Systems

Operating systems vary based on their design and purpose, including:

- **Desktop OS:** Designed for personal computers (e.g., Windows, macOS, Linux).
- Server OS: Optimized for server environments (e.g., Linux Server, Windows Server).
- Mobile OS: Tailored for mobile devices (e.g., Android, iOS).
- Embedded OS: Used in embedded systems (e.g., IoT devices, automotive control systems).

## User Interface

Operating systems provide a user interface (UI) to interact with the system:

- Graphical User Interface (GUI): Offers visual and interactive elements.
- Command-Line Interface (CLI): Uses text-based commands for interaction.

## Importance in Computer Systems

Understanding how the operating system manages hardware is crucial for:

- Optimizing software performance.
- Developing applications compatible with different OS environments.
- Ensuring efficient utilization of system resources.

The operating system is fundamental in the functionality of a computer, providing the necessary environment for software applications to run efficiently and effectively.

The next section of this chapter is **Section 1.8: Systems Communicate with Other Systems Using Networks**.

## Section 1.8: Systems Communicate with Other Systems Using Networks

#### Overview

Computer networks enable the exchange of data and resources between multiple systems. This communication is essential for the functioning of the modern digital world.

## Types of Networks

- Local Area Networks (LAN): Networks in a small geographical area, like an office or home.
- Wide Area Networks (WAN): Networks that span a large geographical area, often composed of multiple LANs.
- The Internet: The largest WAN, connecting millions of computers worldwide.
- Wireless Networks: Use radio waves for connectivity (e.g., Wi-Fi).

## **Network Protocols**

Protocols are rules and standards that allow computers to communicate on a network:

- Transmission Control Protocol/Internet Protocol (TCP/IP): The fundamental suite of protocols for the Internet.
- Hypertext Transfer Protocol (HTTP): Used for transmitting web pages.
- File Transfer Protocol (FTP): For transferring files between computers.

#### IP Addresses and Domain Names

- Each device on a network has a unique IP address.
- Domain names (like www.example.com) are human-readable addresses that are translated to IP addresses through Domain Name Systems (DNS).

#### **Data Transmission Methods**

- Packet Switching: Data is sent in small blocks called packets, each possibly taking different paths to the destination.
- Circuit Switching: Establishes a dedicated communication path between nodes before transmitting data.

## **Network Security**

Ensuring secure communication over networks is crucial, involving measures like encryption, firewalls, and secure protocols.

## Importance of Network Communication

- Facilitates resource sharing and collaboration.
- Enables access to remote services and the Internet.
- Critical for the functioning of distributed systems and cloud computing.

Understanding how systems communicate through networks is vital for developing networked applications, managing data transfer, and ensuring security in digital communications.

The next section of this chapter is **Section 1.9: Important Themes**.

## Section 1.9: Important Themes

#### Amdahl's Law

Amdahl's Law is a principle that predicts the theoretical maximum improvement in system performance when only a part of the system is improved. It is often used in the context of parallel computing to understand the benefits of increasing the number of processors:

- The law states that the overall performance improvement gained by optimizing a particular part of a system is limited by the fraction of time that the improved part is actually used.
- It highlights the importance of identifying and optimizing the bottleneck in a system to achieve significant performance gains.

## Concurrency and Parallelism

Concurrency and parallelism are key concepts in computer systems, enabling more efficient processing:

- Concurrency: Involves multiple tasks making progress simultaneously. It's more about dealing with lots of things at once (like handling multiple users or tasks).
- Parallelism: Refers to multiple tasks or processes running at the same time, often using multiple processors or cores. It's about doing lots of things at the same time.
- Understanding these concepts is crucial for writing efficient programs, especially in an era where multi-core processors are common.

#### The Importance of Abstractions in Computer Systems

Abstractions in computer systems are simplifications of complex reality that help to manage complexity by hiding lower-level details:

- They allow programmers to focus on higher-level problems without worrying about the underlying implementation details.
- Examples include high-level programming languages, APIs (Application Programming Interfaces), and software libraries.

• Effective use of abstractions is key to building complex systems and contributes to better software design and architecture.

Understanding these themes is crucial for computer scientists and engineers, as they underpin many aspects of computer systems design and optimization.

The last section of this chapter is **Section 1.10: Summary**.

#### Section 1.10: Summary

#### Information Is Bits + Context

- Information in computer systems is represented as bits (binary digits).
- Bits combined in different ways can represent complex data.
- Context gives meaning to these bits, like text encoding, image pixels, or audio files.

## Programs Are Translated by Other Programs Into Different Forms

- High-level code is translated into machine code by compilers or interpreters.
- Compilation translates code before it is run, while interpretation translates on-the-fly.
- Some languages use a combination of both, e.g., Java with bytecode and the JVM.

## It Pays to Understand How Compilation Systems Work

- Understanding compilation helps in writing efficient code and effective debugging.
- Compilation involves syntax and semantic analysis, optimization, and code generation.

#### Processors Read and Interpret Instructions Stored in Memory

- The CPU executes instructions stored in memory through an instruction cycle.
- Memory stores program instructions and data, with different types of memory available.

#### Caches Matter

- Cache memory stores frequently accessed data, allowing quicker data retrieval.
- There are multiple levels of cache, each with different speeds and sizes.
- Cache efficiency significantly impacts overall system performance.

#### Storage Devices Form a Hierarchy

- Storage devices range from primary (like caches) to off-line storage (like cloud storage).
- The hierarchy balances cost, speed, and capacity.

#### The Operating System Manages the Hardware

- The OS acts as an intermediary between hardware and software.
- It manages resources like memory, processes, and file systems.
- Different types of OS are tailored for different environments.

## Systems Communicate with Other Systems Using Networks

- Networks enable data and resource exchange between systems.
- Network types include LAN, WAN, and the Internet.
- Protocols, like TCP/IP and HTTP, standardize communication.

## Important Themes

- Amdahl's Law: Theoretical limits of performance improvement in parallel systems.
- Concurrency and Parallelism: Key for efficient processing in multi-core systems.
- Abstractions: Simplify complexity, allowing focus on higher-level problems.

The next chapter we will be covering is **Chapter 2: Representing And Manipulating Information**. The first section of this chapter is **Section 2.1: Information Storage**.

## Section 2.1: Information Storage

#### **Hexadecimal Notation**

Hexadecimal notation is a base-16 numbering system, bridging the gap between binary representation and human readability. It's essential in computing for simplifying the expression of binary data.

- Uses 16 symbols (0-9 and A-F).
- More compact than binary.
- Common in programming and debugging.

#### **Data Sizes**

Data sizes refer to the space data types occupy in memory, impacting how information is stored and processed in computing.

- Varies with data type (integers, floating points).
- Affects memory allocation.
- Influences system architecture (32-bit vs. 64-bit).

## Addressing and Byte Ordering

Addressing and byte ordering deal with how data is stored and accessed in memory, affecting how multi-byte data is interpreted across different systems.

- Big endian and little endian formats.
- Influences cross-platform compatibility.
- Essential for network data transmission.

#### Representing Strings

String representation in computing involves how sequences of characters are stored and manipulated, particularly in programming languages like C.

• Typically null-terminated in C.

- Depends on character encoding.
- Crucial for text processing.

#### Representing Code

Code representation focuses on how programming instructions are translated into a form understandable by the computer's hardware.

- Varies with CPU architecture.
- Essential for understanding machine-level programming.
- Affects software compatibility.

## Introduction to Boolean Algebra

Boolean algebra forms the basis of logical reasoning in computing, using binary values and operators.

- Binary values (true/false, 1/0).
- Operators like AND, OR, NOT.
- Foundation for digital logic design.

## Bit-Level Operations in C

Bit-level operations involve direct manipulation of individual bits in data, crucial for low-level programming.

- Operators include AND (&), OR (|), XOR (^), and NOT ().
- Used in data encoding, encryption.
- Essential for hardware-level programming.

## Logical Operations in C

Logical operations in C are used for decision-making in programs, evaluating conditions.

- Includes AND (&&), OR (||), NOT (!).
- Critical for control flow in programs.
- Affects program logic and decision-making.

## Shift Operations in C

Shift operations involve moving bits left or right within a data word, used in various computing tasks.

- Includes left shift («) and right shift (»).
- Used in tasks like bit manipulation, quick multiplication or division.
- Important for performance optimization.

The next section in this chapter is **Section 2.2: Integer Representations**.

#### Section 2.2: Integer Representations

## Integral Data Types

Integral data types are fundamental in programming, representing whole numbers with varying sizes and ranges.

#### **Key Aspects**

- Size Variations: Typically include byte, short, int, long, with size variations like 8-bit, 16-bit, 32-bit, 64-bit, etc.
- **Signed vs. Unsigned:** Signed integers can represent negative and positive values, while unsigned integers represent only non-negative values.
- Range: The range of values depends on the size and whether the type is signed or unsigned.
- Usage in Programming: Chosen based on the required value range and memory efficiency.

Understanding these types is crucial for effective programming, especially in scenarios where memory and precision are important factors.

## **Unsigned Encodings**

Unsigned encodings are binary representations of non-negative integers, critical in various computing applications.

## **Key Aspects**

- Binary Representation: Uses binary digits (bits) to represent integer values.
- Non-negative Values: Capable of representing only non-negative numbers.
- Range: The range of representable values depends on the number of bits used.
- Usage: Common in scenarios where negative values are not needed, such as memory addresses or certain types of counters.

Understanding unsigned encodings is vital for proper data handling and manipulation in low-level programming and system design.

#### Two's-Complement Encodings

Two's-complement encoding is a binary representation system for positive and negative integers, prevalent in computer systems.

## **Key Aspects**

- Representation: Uses the highest-order bit as a sign bit, with 0 for positive and 1 for negative.
- Range: Allows representation of integers in a symmetric range around zero.
- Arithmetic Operations: Simplifies arithmetic, as addition and subtraction can be performed uniformly without special handling of negative numbers.
- Usage: Standard in most computing systems for integer arithmetic.

Two's-complement encoding's simplicity in arithmetic operations makes it fundamental in digital computing and programming.

#### Conversions Between Signed and Unsigned

Conversion between signed and unsigned integers involves reinterpretation of binary data, with implications for numerical values.

### **Key Aspects**

- **Process:** The binary representation is kept unchanged, but the interpretation of the value differs.
- Range Considerations: Values may be interpreted differently due to the presence (or absence) of a sign bit.
- Importance in Programming: Requires careful consideration in software to avoid data corruption or unintended behavior.
- Use Cases: Common in low-level programming, interfacing with hardware, and systems programming.

Understanding these conversions is essential for accurate data handling and manipulation in various computing scenarios.

## Signed Versus Unsigned in C

The distinction between signed and unsigned integers in C is crucial for correct data representation and manipulation.

## **Key Aspects**

- Range Differences: Unsigned integers can represent larger non-negative numbers, while signed integers include negative values.
- Behavior in Arithmetic: Arithmetic operations may yield different results, particularly with overflow or underflow.
- Functionality: Choice affects functionality like comparison and bit manipulation.
- Best Practices: Selection depends on the program's requirements, considering range and intended arithmetic operations.

Understanding these aspects is essential for effective programming in C, avoiding errors related to integer overflow and data interpretation.

#### Expanding the Bit Representation of a Number

Expanding the bit representation of a number is a process used in computing to increase the number of bits representing a value.

## Key Aspects

- **Sign Extension:** Used for signed numbers, extends the sign bit to the new higher bits to preserve the number's sign.
- Zero Padding: For unsigned numbers, new higher bits are filled with zeros.
- Purpose: Allows for operations or storage in environments with larger word sizes.
- Importance: Critical for data integrity during operations like casting in programming or moving data between different systems.

Understanding bit representation expansion is crucial for ensuring data accuracy in various computing operations.

## Truncating Numbers

Truncating numbers is the process of reducing the bit representation of a number in computing, often leading to information loss.

#### **Key Aspects**

- Reduction of Bits: Involves cutting off the higher-order bits of a number.
- Potential Information Loss: Important information or precision may be lost during truncation.
- Usage Context: Common in operations where smaller data types are needed or when interfacing with systems that support lower word sizes.
- Risks: Can lead to incorrect or unexpected results if not handled carefully.

Careful consideration is needed when truncating numbers to avoid unintentional loss of information or errors.

The next section in this chapter is **Section 2.3: Integer Arithmetic**.

## Section 2.3: Integer Arithmetic

## **Unsigned Addition**

Unsigned addition is a basic arithmetic operation performed on non-negative integers in computing.

## **Key Aspects**

- No Sign Bit: Operands are considered non-negative, and there is no sign bit.
- Overflow: Occurs when the result exceeds the maximum value representable in the given bit width.
- Use Cases: Common in scenarios like memory addressing and certain algorithm implementations.

Understanding unsigned addition is essential for many areas of computer programming and system design.

#### Two's-Complement Addition

Two's-complement addition is used for adding signed integers in binary, crucial in computer arithmetic.

#### **Key Aspects**

- Handling Negative Numbers: Efficiently adds negative and positive integers.
- Overflow Detection: Special attention is needed to detect overflow, especially when adding two numbers with the same sign.
- Computational Efficiency: Simplifies the hardware design for arithmetic operations.

Understanding two's-complement addition is vital in system programming, algorithm design, and digital circuitry.

## Two's-Complement Negation

Two's-complement negation is the method of obtaining the negative equivalent of a binary number.

## Key Aspects

- Inversion and Addition: Involves inverting all the bits of the number (turning 0s to 1s and vice versa) and then adding 1.
- Symmetry in Range: Ensures a symmetric range of negative and positive numbers.
- Zero Special Case: The negation of zero is zero itself in this system.

Two's-complement negation is essential for representing negative numbers in binary and performing arithmetic operations.

## **Unsigned Multiplication**

Unsigned multiplication is the process of multiplying two non-negative integers in binary form.

## **Key Aspects**

- Binary Multiplication: Performs multiplication similar to the decimal system, but with binary numbers.
- No Sign Consideration: Both operands are treated as non-negative.
- Overflow Potential: The result might exceed the allocated bit width, leading to overflow.
- Usage in Computing: Essential in various applications where negative values are not required.

Understanding unsigned multiplication is important for accurate arithmetic operations in computer programming and digital logic design.

## Two's-Complement Multiplication

Two's-complement multiplication involves multiplying signed integers in binary, crucial for arithmetic with signed numbers.

## **Key Aspects**

- Sign Handling: Accounts for the sign of the numbers using two's-complement representation.
- Overflow Detection: Requires careful handling to detect and manage overflow conditions.
- Computational Method: Similar to unsigned multiplication, with additional steps to handle the signs of the operands.
- Usage: Widely used in computer systems for arithmetic operations involving negative numbers.

Understanding two's-complement multiplication is essential for performing arithmetic operations involving signed numbers in computing.

## Multiplying by Constants

Multiplying by constants in computing is an optimization technique for efficient arithmetic operations.

## **Key Aspects**

- Constant Operand: One of the multipliers is a known constant value.
- Optimization: Often optimized by compilers to use less resource-intensive operations.
- **Implementation:** Can be implemented using shifts and additions instead of standard multiplication.
- Usage: Common in scenarios where repetitive multiplication by a fixed number occurs.

This method is crucial for optimizing arithmetic operations in software development and digital logic.

## Dividing by Powers of 2

Dividing by powers of 2 in computing utilizes bit shifting for efficient arithmetic processing.

#### **Key Aspects**

- Bit Shift Operations: Rightward shift operations are used, where each shift effectively divides the number by 2.
- Efficiency: Much faster than standard division operations.
- Considerations: Care is needed to handle signed numbers correctly.
- Usage: Common in optimizations and in systems where processing resources are limited.

This technique is a fundamental optimization strategy in computer programming and digital logic design.

## Week 2: (1/22 - 1/28)



## Representing and Manipulating Information

## 2.0.1 Assigned Reading

The reading assignment for this week is:

• Computer Systems: Chapter 2.4

• Computer Systems: Chapter 3.1

• Computer Systems: Chapter 3.2

#### 2.0.2 Piazza

Piazza discussions are optional this week. (1/30/24)

#### 2.0.3 Lectures

The lecture videos for this week are:

- IEEE Floating Point  $\approx 20$  min.
- IEEE Floating Point: Examples  $\approx 24$  min.
- IEEE Floating Point: Rounding  $\approx 42$  min.
- Historical Perspective  $\approx 22$  min.
- Program Encodings  $\approx 26$  min.
- GDB Tutorial: Part I Basic Debugging  $\approx 17$  min.
- GDB Tutorial: Part II Printing And Examining Data  $\approx 18$  min.

The lecture notes for this week are:

- Integer Mathematical Operations And Memory Representations
- Floating Point
- Floating Point Examples
- Floating Point Rounding And Operations
- GDB Reference

## 2.0.4 Assignments

The assignment for this week is:

- Data Lab (1/30/24)
- Data Lab Extra Credit (1/30/24)
- Data Lab Interview (1/30/24)

### 2.0.5 Quiz

The quizzes for this week are:

• Chapter 2.4 Quiz • Chapter 2.4 Finalized Quiz (1/30/24)

## 2.0.6 Chapter Summary

The first chapter for this week is **Chapter 2: Representing And Manipulating Information**. The section that we are covering from this chapter is **Section 2.4: Floating Point**.

## Section 2.4: Floating Point

#### Overview

Floating point is a method used to represent and manipulate real numbers in computers. It allows for the representation of a vast range of values, from very large to very small, by using a fixed number of bits. A floating point number is typically represented by three components: the sign (indicating positive or negative), the exponent, and the fraction (or mantissa), which represents the precision of the number.

#### IEEE 754 Standard

- **Purpose**: The IEEE 754 standard is the most widely used standard for floating-point computation, and it defines the format for representing floating-point numbers and the rules for arithmetic operations.
- Formats: It includes several formats, but the two most common are single precision (32 bits) and double precision (64 bits).

## Floating Point Representation

- Single Precision (32-bit): Consists of 1 bit for the sign, 8 bits for the exponent, and 23 bits for the fraction.
- Double Precision (64-bit): Consists of 1 bit for the sign, 11 bits for the exponent, and 52 bits for the fraction.

#### Normalization

- **Purpose**: Normalization in floating-point representation ensures that the number is represented in the most efficient way, maximizing the precision.
- **Process**: It involves adjusting the exponent and fraction so that the fraction begins with a non-zero digit.

#### Special Values

- Zero: Represented by an exponent of all 0s and a fraction of all 0s.
- Infinity: Represented by an exponent of all 1s and a fraction of all 0s.
- NaN (Not a Number): Represented by an exponent of all 1s and a non-zero fraction.

#### **Precision And Rounding**

• **Issues**: Precision in floating-point representation is limited, which can lead to rounding errors in computations.

• Rounding Strategies: Several strategies exist to minimize these errors, such as round-tonearest or round-towards-zero.

## **Basics Of Fractional Binary Numbers**

- **Definition**: Fractional binary numbers are a way to represent numbers that are not whole numbers (i.e., fractions) in binary form.
- Representation: Just like whole numbers are represented in binary using powers of 2 ( $2^0$ ,  $2^1$ ,  $2^2$ , etc.), fractional binary numbers use negative powers of 2 ( $2^{-1}$ ,  $2^{-2}$ ,  $2^{-3}$ , etc.).

## **Binary Fractional Places**

- Right of the Decimal Point: Each place value to the right of the binary point (equivalent to the decimal point in base 10) represents a negative power of 2.
- **Example**: In the binary fractional number 0.101, the first digit after the binary point is  $2^{-1} = (0.5)$ , the second is  $2^{-2} = (0.25)$ , and so on.

## Converting To Decimal

- **Process**: To convert a binary fractional number to a decimal, multiply each digit by its corresponding power of 2 and sum the results.
- Example: The binary number 0.101 is calculated as  $(1 \cdot 2^{-1}) + (0 \cdot 2^{-2}) + (1 \cdot 2^{-3})$ , which equals 0.625 in decimal.

#### **Precision And Limitations**

- Finite Representation: Just like decimal fractions, binary fractions may not always have a finite representation. For example, the decimal fraction 0.1 (one-tenth) does not have a finite binary representation.
- **Precision Loss**: In computer systems, this can lead to precision loss, especially in calculations involving floating-point arithmetic.

#### Importance In Computing

- **Usage**: Understanding fractional binary numbers is essential in fields like digital signal processing, computer graphics, and scientific computing.
- **Floating-Point Numbers**: This concept is also fundamental to the representation of real numbers in computers, as used in floating-point formats.

## **Basics Of Fractional Binary Numbers Summary**

Fractional binary numbers are key to representing and manipulating non-integer numbers in computer systems. The process of converting between binary and decimal fractions is important for understanding how computers process and store decimal numbers. Awareness of the limitations and precision issues with binary fractions is crucial in various computing applications.

#### **IEEE Floating-Point Representation**

- Overview: IEEE Floating-Point Representation is a standard for representing and computing floating-point numbers, widely adopted in computer systems for numerical computations.
- **Standard**: The most common standard used is IEEE 754, which defines the format for floating-point numbers and the rules for floating-point arithmetic.

## Components Of IEEE Floating-Point Format

- **Sign Bit**: The first bit is the sign bit, with 0 for positive numbers and 1 for negative numbers.
- **Exponent**: Follows the sign bit, represents the exponent of the number. The exponent is in a 'biased' form.
- Fraction (or Mantissa): Represents the precision of the floating-point number. It's a binary fraction.

#### **IEEE 754 Formats**

- Single Precision: 32-bit format with 1 sign bit, 8 exponent bits, and 23 fraction bits.
- **Double Precision**: 64-bit format with 1 sign bit, 11 exponent bits, and 52 fraction bits.

## Normalization In Floating-Point Representation

- **Purpose**: Normalization maximizes the precision of the number and ensures that the floating-point representation is unique.
- **Process**: In normalized form, the fraction field represents a number greater than or equal to 1 and less than 2.

## Special Values In IEEE 754

- Zero: Represented by all bits zero in both the exponent and the fraction.
- Infinity: Represented by all bits one in the exponent and all bits zero in the fraction.
- NaN (Not a Number): Represented by all bits one in the exponent and a non-zero fraction.

#### Precision And Rounding Issues

- Limited Precision: The finite number of bits limits the precision of floating-point numbers, which can lead to rounding errors.
- Rounding Strategies: Various strategies, like round-to-nearest or round-towards-zero, are used to minimize these errors.

#### Importance Of IEEE Floating-Point Representation

- **Significance**: The IEEE 754 standard is critical in ensuring consistency and accuracy in floating-point computations across different computing platforms.
- **Applications**: Used extensively in scientific computing, graphics, and other fields requiring precise numerical computations.

#### **IEEE Floating-Point Representation Summary**

IEEE Floating-Point Representation is essential for accurate and consistent numerical computation in computer systems. The IEEE 754 standard outlines specific formats and rules for floating-point arithmetic, addressing issues of precision and rounding. Understanding this representation is crucial for fields that rely heavily on numerical computations.

## Example Numbers In IEEE Floating-Point Representation

Below are some examples of floating point numbers in IEEE floating-point format to demonstrate the application of the standard.

## Example 1 - Representation of a Positive Number

- **Decimal Number**: Let's consider the decimal number 6.25.
- Binary Equivalent: In binary, 6.25 is represented as 110.01.
- Normalized Form: The normalized form in IEEE format is  $1.1001 \times 2^2$ .
- IEEE Representation: Assuming single precision, the sign bit is 0, the exponent is 2 + 127 = 129 (which is 10000001 in binary), and the mantissa is the binary fraction 1001000... (23 bits total).

## Example 2 - Representation of a Negative Number

- **Decimal Number**: Consider the decimal number -10.75.
- Binary Equivalent: In binary, -10.75 is -1010.11.
- Normalized Form: The normalized form in IEEE format is  $-1.01011 \times 2^3$ .
- **IEEE Representation**: In single precision, the sign bit is 1, the exponent is 3+127=130 (which is 10000010 in binary), and the mantissa is 0101100... (23 bits).

## Example 3 - Special Values

- **Positive Infinity**: Represented by a sign bit of 0, an exponent of all 1s, and a fraction of all 0s.
- NaN (Not a Number): Represented by an exponent of all 1s and a non-zero fraction.

#### **Example Numbers Summary**

These examples illustrate how different types of numbers, including special values, are represented in the IEEE floating-point format. Understanding these examples helps in comprehending the practical application of IEEE 754 standard in computer systems for representing various numerical values accurately.

#### Rounding In IEEE Floating-Point Representation

This subsection discusses the concept of rounding in the context of IEEE floating-point representation, highlighting its significance and methods.

#### Significance of Rounding

- **Purpose**: Rounding is crucial in floating-point arithmetic because it allows us to fit numbers into a finite number of bits.
- **Precision Limitation**: Due to the limited number of bits in the mantissa, not all decimal numbers can be represented exactly, necessitating rounding.

#### Rounding Methods

- Round to Nearest (Ties to Even): The most commonly used method. It rounds to the nearest value; if the number falls exactly in the middle, it is rounded to the nearest even number.
- Round Toward Zero: Rounds the number towards zero, effectively truncating the fractional part.
- Round Up (Toward Positive Infinity): Always rounds numbers up.
- Round Down (Toward Negative Infinity): Always rounds numbers down.

## Implications of Rounding

- Rounding Errors: Can introduce small errors in computations, which may accumulate in successive calculations.
- Importance in Algorithms: Understanding how rounding works is essential in algorithm design, particularly in numerical methods and computer graphics.

## Rounding Summary

Rounding in IEEE floating-point representation is a necessary process due to the finite representation of numbers. Different rounding methods are used based on the context and requirements of the computation. It is essential to be aware of rounding errors and their potential impact on the accuracy of numerical computations in computer systems.

## Floating Point in C

This subsection focuses on the representation and handling of floating-point numbers in the C programming language, emphasizing its adherence to the IEEE standard and specific characteristics.

## **IEEE Standard Compliance**

- Compatibility: C language supports IEEE 754 standard for floating-point representation, ensuring portability and consistency across platforms.
- Data Types: The primary data types for floating-point numbers in C are float, double, and long double.

#### Data Types and Precision

- float: Usually represents a single precision floating-point number (32 bits).
- double: Represents a double precision floating-point number (64 bits), offering higher precision.
- long double: Provides even higher precision than double, its size and precision can vary depending on the compiler and platform.

#### **Arithmetic Operations**

- Operations: Includes addition, subtraction, multiplication, division, and remainder.
- Accuracy: Precision in calculations is limited by the data type used, with double and long double offering greater accuracy.

## **Handling Special Values**

- Infinity and NaN: C can represent special values like infinity and NaN (Not a Number) in accordance with IEEE 754.
- Functions: Functions like isinf() and isnan() are used to check for these special values.

#### Limitations and Considerations

- **Precision Limits**: The finite representation of floating-point numbers in C can lead to rounding errors and limitations in numerical accuracy.
- Best Practices: Careful selection of data types and awareness of precision limitations are crucial in numerical computing with C.

## Floating Point in C Summary

The C programming language provides comprehensive support for floating-point arithmetic in line with the IEEE 754 standard. Understanding the characteristics and limitations of different floating-point data types (float, double, and long double) is essential for effective numerical programming in C. Special values like infinity and NaN are also supported, with functions available for their detection and handling.

The next chapter that we are covering this week is **Chapter 3: Machine-Level Representation Of Programs**. The first section that we are covering from this chapter is **Section 3.1: A Historical Perspective**.

## Section 3.1: A Historical Perspective

#### A Historical Perspective

This section provides a historical overview of the machine-level representation of programs, tracing the evolution of programming from early machine languages to modern high-level languages.

## Early Machine Languages

- First Computers: Early computers were programmed in machine language, a low-level programming language understood directly by the computer's hardware.
- **Binary Coding**: Programs were written in binary code, which was tedious and error-prone, requiring programmers to have deep knowledge of the hardware.

## Assembly Languages and Assemblers

- Introduction of Assembly Language: To simplify machine-level programming, assembly languages were developed. These languages use mnemonic codes and symbols to represent machine-level instructions.
- **Assemblers**: Programs called assemblers were created to convert assembly language code into machine language automatically.

#### **High-Level Languages**

• **Development**: With the increasing complexity of software, high-level languages like Fortran, C, and Java were developed, allowing programmers to write code in a more human-readable form.

• Compilers: Compilers translate high-level language code into machine language, bridging the gap between human logic and machine instructions.

## Modern Programming

- Advancements: Modern programming involves a mix of high-level languages for application development and low-level languages for system-level programming.
- Integrated Development Environments (IDEs): These environments provide tools that aid in writing, testing, and debugging code, further simplifying the programming process.

## A Historical Perspective Summary

The evolution from early machine languages to high-level programming languages represents a significant advancement in the field of computing. This progression has made programming more accessible and efficient, enabling the development of complex software systems. It highlights the importance of understanding both high-level and low-level programming concepts in computer science.

The next section that we are covering from this chapter is **Section 3.2: Program Encodings**.

## Section 3.2: Program Encodings

#### Machine-Level Code

This topic delves into the specifics of machine-level code, discussing its nature, characteristics, and importance in the context of program encodings.

#### Nature of Machine-Level Code

- **Direct Hardware Interaction**: Machine-level code interacts directly with the computer's hardware, providing control over every operation executed by the processor.
- Binary Format: It is expressed in binary, making it the lowest level of code that is directly executed by the computer's CPU.

#### Characteristics of Machine-Level Code

- Efficiency: Machine-level code is highly efficient as it is tailored to the specific architecture of the processor.
- Complexity: Due to its low-level nature, it is more complex and harder to read compared to high-level programming languages.

## Role in Program Execution

- Execution by Processor: The CPU executes machine-level code directly, translating the binary instructions into actions.
- Foundation for Higher-Level Languages: All high-level language programs are eventually converted to machine-level code for execution.

## Importance in Computer Systems

- **Performance Optimization**: Understanding machine-level code is crucial for optimizing the performance of software, especially in system-level programming.
- Hardware-Specific Programming: It is essential for programming that requires direct interaction with the hardware, such as device drivers and embedded systems.

## Machine-Level Code Summary

Machine-level code represents the most fundamental form of program encoding, directly executable by the CPU. Its efficiency and direct hardware interaction make it indispensable for performance-critical and hardware-specific applications. However, its complexity and low-level nature limit its use to specialized areas of programming.

#### Code Examples

This topic focuses on providing practical examples of machine-level code, illustrating how various programming constructs are represented at this fundamental level.

#### **Basic Instructions**

- Arithmetic Operations: Examples include addition, subtraction, multiplication, and division, showing how these basic operations are encoded in machine-level language.
- **Data Movement**: Illustrates instructions for moving data between the CPU and memory locations, and within CPU registers.

#### Control Structures

- Conditional Execution: Demonstrates how conditional statements like if-else are implemented in machine code.
- Loops: Shows the encoding of loop constructs like for and while, detailing how iteration is managed at the machine level.

#### Function Calls and Returns

- Calling Mechanism: Explores how functions are called, including passing arguments and the setup of the call stack.
- Return Process: Describes how control is returned to the calling function, including stack unwinding and return value handling.

#### Complex Constructs

- Arrays and Structures: Provides examples of how more complex data structures like arrays and structs are handled and accessed in machine code.
- **Pointer Operations**: Examines the representation and manipulation of pointers at the machine level, crucial for dynamic memory management.

#### Code Examples Summary

These code examples serve to bridge the gap between high-level programming concepts and their low-level machine-level representations. Understanding these examples is key to comprehending the fundamentals of how high-level constructs are translated into executable machine code, providing insight into the workings of compilers and the efficiency of different coding practices.

## Week 3: (1/29 - 2/4)



## Machine-Level Representation of Programs

## 3.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 3.3
- Computer Systems: Chapter 3.4
- Computer Systems: Chapter 3.5

#### 3.0.2 Piazza

Piazza discussions are optional this week. (2/6/24)

## 3.0.3 Lectures

The lecture videos for this week are:

- Accessing Information  $\approx 22$  min.
- Arithmetic Operations  $\approx 15$  min.
- Control Condition Flags  $\approx 14$  min.
- Control If / Then / Else  $\approx 16$  min.
- GDB Tutorial: Part III Machine Instructions  $\approx 17$  min.
- GDB Tutorial: Part IV Online Disassembler  $\approx 10$  min.

#### The lecture notes for this week are:

- Machine-Level Programming I Architecture, Assembly And Object Code Lecture Notes
- Machine-Level Programming I Basics Arithmetic And Logical Operations Lecture Notes
- Machine-Level Programming I Basics I Lecture Notes
- Machine-Level Programming I Basics II Lecture Notes
- Machine-Level Programming II Control If, Then, Else Lecture Notes
- Machine-Level Programming II Control Loops Lecture Notes
- Machine-Level Programming II Control Switch Statement Lecture Notes
- Machine-Level Programming II Control

## 3.0.4 Assignments

The assignment for this week is:

- Bomb Lab (2/20/24)
- Bomb Lab Extra Credit (2/20/24)
- Bomb Lab Interview (2/20/24)

## 3.0.5 Quiz

The quizzes for this week are:

- Chapter 3.1 3.5 Quiz Chapter 3.1 3.5 Finalized Quiz (2/6/24)
- Practice Exam 1 Chapter 2

#### 3.0.6 Chapter Summary

The chapter that is covered this week is **Chapter 3: Machine-Level Representation of Programs**. The first section that is being covered this week is **Section 3.3: Data Formats**.

#### Section 3.3: Data Formats

#### Overview

When delving into the machine-level representation of programs, particularly through the lens of assembly code, we're peering into the realm where high-level programming constructs are distilled into the fundamental instructions that a computer's CPU can directly execute. Assembly language serves as a thin veneer over the raw binary code, providing mnemonic codes (like MOV, ADD, SUB) that correspond to machine operations, and it operates directly on the CPU's registers and memory. In this context, understanding data formats becomes crucial, as it's at this level that the abstract data types we use in higher-level languages (like integers, floating points, and characters) are translated into a form the hardware can understand and manipulate.

Data formats at the machine level are intimately tied to the architecture of the CPU—the size of registers, the memory addressing capabilities, and the instruction set architecture (ISA) define how data is represented, accessed, and manipulated. For instance, a 32-bit architecture might naturally work with 32-bit integers and addresses, while a 64-bit architecture expands these capabilities. This directly impacts how data of various types is stored in memory, how arithmetic and logical operations are performed, and how data is moved between memory and registers.

#### **Key Concepts in Machine-Level Data Formats**

### • Registers and Memory Addressing

- Registers: Small, fast storage locations directly inside the CPU used to hold temporary data, including operands for arithmetic operations, addresses for memory operations, and status codes.
- Memory Addressing: The method by which instructions refer to memory locations. In assembly language, instructions explicitly state which registers or memory addresses are involved in operations.

#### • Integer Representation

- At the machine level, integers are typically represented in binary form using a fixed number of bits (e.g., 8, 16, 32, 64 bits). The representation can be signed or unsigned, with signed integers often using two's complement notation.

#### • Floating-Point Representation

 Floating-point numbers are represented using a format that can accommodate a wide range of values by separating the number into sign, exponent, and fraction parts, in accordance with the IEEE 754 standard.

#### • Character and String Representation

 Characters are stored using specific encoding schemes (like ASCII or Unicode) that map characters to binary values. Strings are sequences of characters, typically terminated by a special character (like backslash 0 in C).

#### • Instruction Encoding

- Every machine instruction has a specific binary representation, which includes opcode (operation code) and operands (data or memory addresses the operation applies to). The exact format depends on the CPU's instruction set architecture (ISA).

#### • Data Alignment and Padding

Data alignment refers to arranging data in memory at addresses that match its size to
optimize access speed. Padding may be added to ensure alignment, affecting the layout of
structures in memory.

## • Addressing Modes

Assembly languages offer various addressing modes to specify operands for instructions.
 These can include immediate (direct value), register (value in a register), direct (memory address), and indirect (address stored in a register) modes, among others.

#### • Endianness

The order in which bytes are arranged in memory for multi-byte data types. Big endian stores the most significant byte at the smallest address, while little endian does the opposite. This affects how data is read and written at the byte level.

The next section that will be covered from this chapter is **Section 3.4: Accessing Information**.

#### Section 3.4: Accessing Information

#### **Operand Specifiers**

Operand specifiers in assembly language tell the processor exactly where to find the operands needed for an operation or where to store the results of an operation. These specifiers can refer to immediate values (constants), CPU registers, or memory locations. The syntax and capabilities of operand specifiers vary between different assembly languages, influenced by the CPU architecture (e.g., x86, ARM). At their core, they allow the programmer to precisely control data movements and operations, which is essential for efficient low-level programming.

## **Key Concepts**

#### • Immediate Operands

- Immediate operands are constants encoded directly in the instructions. They are used for operations that do not change, like adding a specific number to a register.
- Example: ADD EAX, 5 // Adds 5 to the value in the EAX register.

#### • Register Operands

- Register operands specify CPU registers as either source or destination for data. Registers offer the fastest way to access data.
- Example: MOV EBX, EAX // Copies the value from the EAX register to the EBX register.

#### • Memory Operands

- Memory operands allow instructions to specify data in memory, accessed via addressing modes (direct, indirect, indexed, etc.).
- Example: MOV ECX, <code>[EBX]</code> // Moves the value at the memory location pointed to by <code>EBX</code> into <code>ECX</code>.

### • Direct Addressing

- Specifies the memory address directly in the instruction. It's used for accessing global variables or fixed data structures.
- Example: MOV EDX, [0x0040A0C8] // Moves the value from a specific memory address into the EDX register.

#### • Indirect Addressing

- Uses a register to hold the memory address of the operand. This method is flexible and used for accessing data structures like arrays.
- Example: MOV EAX, [ESI] // Moves the value from the memory location pointed to by ESI into EAX.

## • Indexed Addressing (with Displacement)

- Combines a base address in a register with an offset (displacement) to access an array element or a structure field.
- Example: MOV AL, [EBX + 4] // Moves the value from memory at the address EBX+4 into the AL register.

## Sample Assembly Code with Comments

Consider a simple sequence of assembly instructions (x86 syntax) with comments explaining each step:

```
MOV EAX, 10; Load immediate value 10 into the EAX register.

ADD EAX, [EBP-4]; Add the value at the memory location EBP-4 to EAX.

MOV [EDI], EAX; Store result from EAX into the memory location pointed to EDI.

MOV ECX, [EDI]; Copy the result from memory (where EDI points) into ECX.
```

#### In this sequence:

- The first line loads a constant value (10) directly into the EAX register.
- The second line adds to EAX the value stored in memory at an address computed by subtracting 4 from the EBP register.
- The third line stores the result of the addition back into memory, at the address contained in the EDI register.
- The final line moves the stored value from the memory location back into the ECX register for further use.

#### **Introduction to Data Movement Instructions**

Data movement instructions in assembly language are used to load data from memory into registers, store data from registers back to memory, and transfer data between registers. These instructions are essential for any operation that requires data manipulation, as they control how data is accessed and where it is stored during program execution. Different types of data movement instructions are designed to handle various tasks, such as initializing registers, moving data around in memory, and setting up the environment for operations like arithmetic or logic functions.

#### **Key Concepts**

#### • MOV Instruction

- The MOV instruction copies data from one location to another without modifying the source.
   It's the most basic form of data movement.
- Example: MOV EAX, EBX // Copies the value from EBX to EAX.

#### • PUSH and POP Instructions

- These instructions work with the stack, a last-in-first-out (LIFO) data structure. PUSH adds a value to the top of the stack, while POP removes the top value from the stack.
- Example: PUSH EAX // Saves the current value of EAX onto the stack.
- Example: POP EAX // Restores the last saved value from the stack into EAX.

## • LEA (Load Effective Address) Instruction

- The LEA instruction calculates the address of a memory operand and stores it in a register.
   It's often used for pointer arithmetic.
- Example: LEA EAX, [EBX+ECX\*2] // Calculates the address EBX + ECX\*2 and stores it in EAX.

#### • XCHG Instruction

- The XCHG instruction swaps the values between two operands. It can be used for register-register, register-memory, or memory-memory transfers.
- Example: XCHG EAX, EBX // Swaps the values of EAX and EBX.

### • IN and OUT Instructions

- These instructions are used for communication with I/O devices. IN reads data from an I/O device into a register, while OUT writes data from a register to an I/O device.
- Example: IN AL, DX // Reads data from the I/O port specified by DX into the AL register.
- Example: OUT DX, AL // Writes data from the AL register to the I/O port specified by DX.

## Sample Assembly Code with Comments

Here's a simple sequence of assembly instructions (assuming x86 syntax) with comments to illustrate data movement:

```
MOV EBX, 1000 ; Move the immediate value 1000 into the EBX register.

PUSH EBX ; Save the value in EBX onto the stack.

MOV EAX, [MYVAR] ; Copy the value from memory location labeled MYVAR into EAX.

LEA EDI, [EAX+4] ; Calculate the address EAX+4 and store it in EDI.

POP EBX ; Restore the previous value from the stack into EBX.

KCHG EAX, EDI ; Swap the values of EAX and EDI.
```

#### In this sequence:

- The first instruction initializes EBX with a value of 1000.
- The second instruction saves this value on the stack for later use.
- The third instruction moves a value from a memory location into EAX.
- The fourth calculates a new address based on the value in EAX and stores it in EDI.
- The fifth restores EBX's value from the stack.
- The final instruction swaps the contents of EAX and EDI, demonstrating a simple data exchange.

## Introduction to Pushing and Popping Stack Data

In assembly language, the PUSH and POP instructions are used to add and remove data from the stack, respectively. The stack itself is a crucial part of the program's memory space, managed with a stack pointer (SP or ESP in x86 architecture), which tracks the top of the stack. The PUSH instruction decreases the stack pointer (since the stack typically grows towards lower memory addresses) and places a value at the new top of the stack. Conversely, the POP instruction removes the value from the top of the stack and increases the stack pointer, making space for new data.

#### • PUSH Instruction

- The PUSH instruction places data onto the top of the stack and adjusts the stack pointer accordingly. It's commonly used to save the current state of registers before calling a function.
- Example: PUSH EAX // Saves the current value of the EAX register onto the stack.

#### • POP Instruction

- The POP instruction removes the topmost data from the stack, placing it into a specified register or memory location, and adjusts the stack pointer back. It's often used to restore register values after a function call.
- Example: POP EAX // Restores the last saved value from the stack into the EAX register.

#### • Function Call Conventions

— The PUSH and POP instructions are integral to function call conventions, where PUSH is used to pass arguments to functions and to save the return address, and POP is used to clean up the stack after the function returns.

#### • Stack Frame Management

 Each function call creates a new stack frame, which includes the function's local variables, arguments, and the return address. PUSH and POP are used to manage these frames.

#### • Using PUSH and POP for Register Preservation

 Registers are saved on the stack before calling a function to preserve their values for later use, ensuring that called functions don't overwrite values needed by the caller.

## Sample Assembly Code with Comments

Here's a sample code snippet illustrating the use of PUSH and POP in a function call context:

```
PUSH EAX ; Save EAX register value on the stack.

PUSH EBX ; Save EBX register value on the stack.

CALL MyFunction ; The return address is automatically pushed onto stack.

POP EBX ; Restore original EBX value.

Restore original EAX value.
```

#### In this sequence:

- The PUSH instructions save the current values of the EAX and EBX registers before the function call, ensuring that their values are not lost if MyFunction modifies them.
- The CALL instruction pushes the return address onto the stack and jumps to MyFunction.
- After MyFunction executes and returns, the POP instructions restore the original values
  of EAX and EBX, ensuring the calling function can continue using these registers without
  loss of data.

The last section from this chapter for the week is **Section 3.5**: **Arithmetic and Logical Operations**.

## Section 3.5: Arithmetic and Logical Operations

#### Overview

Arithmetic and Logical Operations in assembly language programming covers the fundamental instructions that perform mathematical and logical computations directly on the CPU. These operations include basic arithmetic instructions like addition (ADD), subtraction (SUB), multiplication (MUL), and division (DIV), as well as logical instructions such as AND, OR, XOR, and NOT. Arithmetic operations are used to perform calculations on data, while logical operations manipulate data at the bit level, often used for testing conditions, masking, setting, or clearing specific bits. Both sets of operations are crucial for implementing the logic of higher-level constructs like loops, conditionals, and complex calculations in low-level programming. These instructions work directly with the CPU's registers and memory, providing the efficiency and control necessary for system programming, performance-critical applications, and hardware interfacing.

## Load Effective Address

The LEA (Load Effective Address) instruction in assembly language is a powerful and versatile operation, primarily used for arithmetic and addressing calculations without affecting the processor's flags. Unlike other arithmetic instructions, LEA calculates the address of a memory operand and stores it in a register. This makes it uniquely useful for pointer arithmetic, constructing addresses, and even performing certain arithmetic operations quickly and efficiently.

#### • Efficiency and Use Cases

- The LEA instruction is efficient for calculations that involve addresses or constants, as it bypasses the arithmetic logic unit (ALU) and does not alter the status flags.

 Example: LEA EBX, [EAX + 4\*ECX] // Calculates the address EAX + 4\*ECX and stores it in EBX.

#### • Arithmetic Operations

- Although primarily intended for address calculations, LEA can be creatively used for certain arithmetic operations, such as multiplication by constants (2, 3, 4, etc.) and addition.
- Example: LEA EDX, [EAX + EAX\*2] // Multiplies EAX by 3 and stores the result in EDX.

#### • Pointer Arithmetic

- LEA is especially useful in pointer arithmetic, where it can calculate new pointer values without loading the actual data at the pointer address.
- Example: LEA ESI, [ESI + EDI\*8] // Moves ESI pointer by EDI elements, assuming each element is 8 bytes.

## Sample Assembly Code with Comments

The following code snippet demonstrates the use of LEA for both address calculation and simple arithmetic:

```
LEA EDI, [EAX + 4*EBX]; Calculates the address EAX + 4*EBX and stores in EDI.
LEA ECX, [ECX + ECX*2]; Doubles ECX and adds the original ECX, result in ECX.
```

In this example:

- The first line demonstrates address calculation, effectively using LEA to prepare an address for a later memory operation without performing any data movement.
- The second line shows how LEA can be used for arithmetic operations, in this case, tripling the value of ECX without affecting the CPU's flags.

#### **Unary and Binary Operations**

Unary and binary operations form the core of arithmetic and logical manipulations in assembly language programming. Unary operations, such as INC (increment) and DEC (decrement), operate on a single operand, modifying its value directly. Binary operations involve two operands and include a wide range of instructions for performing arithmetic (e.g., ADD, SUB) and logical (e.g., AND, OR, XOR) operations.

## • Unary Operations

- Unary operations modify the value of a single operand. These operations are efficient for simple increments or decrements.
- Example: INC EAX // Increments the value in EAX by 1.
- Example: DEC EDX // Decrements the value in EDX by 1.

#### • Binary Operations

- Binary operations perform arithmetic or logical computations using two operands. They are fundamental for mathematical calculations and data manipulation.
- Arithmetic Example: ADD EAX, EBX // Adds the value in EBX to EAX.
- Logical Example: AND ECX, EDX // Performs a bitwise AND on ECX and EDX, storing the result in ECX.

#### • Use in Programming

 These operations are essential for implementing the logic of higher-level programming constructs, such as loops, conditionals, and complex calculations, directly in assembly language.

# Sample Assembly Code with Comments

Here's a sample code snippet that demonstrates the use of unary and binary operations in an assembly context:

```
I INC EAX ; Increment the value in EAX by 1.

ADD EAX, EBX ; Add the value in EBX to EAX, result stored in EAX.

AND EAX, Oxff ; Perform a bitwise AND between EAX and Oxff.

DEC EAX ; Decrement the value in EAX by 1.
```

# In this sequence:

- The INC and DEC instructions demonstrate unary operations, directly modifying the value of the operand.
- The ADD instruction exemplifies a binary arithmetic operation, combining two values into
  one.
- The AND instruction shows a binary logical operation, useful for manipulating bits within operands.

# **Shift Operations**

Shift operations are crucial in assembly language for manipulating data at the bit level. These operations include logical shifts (SHL for left shift and SHR for right shift) and arithmetic shifts (SAL for arithmetic left shift, which is synonymous with SHL, and SAR for arithmetic right shift). Logical shifts are used for binary multiplication or division by powers of two and for bit manipulation tasks, while arithmetic shifts also consider the sign of signed integers, preserving the number's sign while shifting.

# • Logical Shift Operations

- Logical shift operations move bits left or right, introducing zeros to fill the vacated bit positions.
- Left Shift Example: SHL EBX, 1 // Multiplies the value in EBX by 2.
- Right Shift Example: SHR EBX, 1 // Divides the value in EBX by 2, discarding the remainder.

# • Arithmetic Shift Operations

- Arithmetic shift operations also shift bits left or right but preserve the sign bit for right shifts, making them suitable for signed numbers.
- Arithmetic Left Shift Example: SAL EAX, 2 // Multiplies the value in EAX by 4, preserving the sign.
- Arithmetic Right Shift Example: SAR EDX, 2 // Divides the value in EDX by 4, preserving the sign and rounding towards negative infinity.

# • Use in Efficient Computing

 Shift operations are particularly efficient for performing quick multiplication or division by powers of two and for bit masking operations, critical in low-level programming and algorithms.

# Sample Assembly Code with Comments

Below is a demonstration of shift operations applied in an assembly code snippet:

```
SHL ECX, 3; Left shift ECX by 3
SAR EDI, 2; Arithmetic right shift EDI by 2
SHR EAX, 1; Logical right shift EAX by 1
```

In this example:

- The SHL instruction is used for a binary multiplication by 8, showcasing the efficiency of left logical shifts for scaling values by powers of two.
- The SAR instruction demonstrates the use of arithmetic right shifts for signed division, ensuring the sign bit is preserved.
- The SHR instruction illustrates a logical right shift for binary division by 2, applicable for unsigned numbers or when the sign is irrelevant.

# **Special Arithmetic Operations**

Special arithmetic operations in assembly language extend beyond the basic arithmetic instructions to include operations such as MUL (unsigned multiply), IMUL (signed multiply), DIV (unsigned divide), and IDIV (signed divide). These operations are critical for performing multiplication and division on both signed and unsigned integers, accommodating the nuances of integer arithmetic. Additionally, instructions like INC (increment) and DEC (decrement) offer optimized pathways for adding or subtracting one, which is a common operation in loops and iterative processes.

# Multiplication and Division

- MUL and IMUL perform multiplication on unsigned and signed integers, respectively, potentially using multiple registers to store the result due to the increased result size.
- DIV and IDIV execute division operations, handling both quotient and remainder. These
  instructions require careful preparation of registers before execution to accommodate the
  result.

# • Optimized Increment and Decrement

 INC and DEC instructions provide efficient means to increase or decrease a value by one, crucial for loop counters and conditional operations without affecting carry flag.

# • Application in Complex Calculations

 These operations are indispensable for implementing complex mathematical functions, algorithms, and handling large numbers or precise calculations in low-level programming.

# Sample Assembly Code with Comments

Here's an illustrative assembly code snippet showcasing special arithmetic operations:

```
I IMUL EBX, ECX ; Signed multiply ECX by EBX, result in EBX.

IDIV EDX ; Accumulator by EDX, quotient in EAX, remainder in EDX.

INC ESI ; Increment the value in ESI by 1.

DEC EDI ; Decrement the value in EDI by 1.
```

# In this sequence:

- The IMUL instruction demonstrates signed multiplication, accommodating scenarios where operand signs may vary.
- The IDIV instruction highlights the handling of signed division, preparing for both quotient and remainder results.
- The INC and DEC instructions showcase optimized operations for incrementing and decrementing, pivotal in controlling loops and iterations.



# Week 4: (2/5 - 2/11)



# Machine-Level Representation of Programs

# 4.0.1 Assigned Reading

The reading assignment for this week is:

• Computer Systems: Chapter 3.6

• Computer Systems: Chapter 3.7

# 4.0.2 Piazza

Piazza discussions are optional this week. (2/13/24)

# 4.0.3 Lectures

The lecture videos for this week are:

- Control Loops  $\approx 18$  min.
- Control Switch Statements  $\approx 18$  min.
- Control PC Relative Switch Statements  $\approx 29$  min.
- Procedures Overview  $\approx 12$  min.
- Procedures Stack Based Languages  $\approx 13$  min.
- Procedures Calling Conventions  $\approx 12$  min.
- Procedures Recursion  $\approx 12$  min.

#### The lecture notes for this week are:

- Machine-Level Programming III Procedures Stack Sase Languages Lecture Notes
- Machine-Level Programming III Procedures Lecture Notes
- Machine-Level Programming IV Data Arrays Lecture Notes
- Machine-Level Programming IV Data Pointers Lecture Notes
- Machine-Level Programming IV Data Structs And Unions Lecture Notes
- Machine-Level Programming IV Procedures Passing Data Lecture Notes
- Machine-Level Programming IV Procedures Recursion Lecture Notes
- Machine-Level Programming V Buffer Overflows And Attacks Lecture Notes
- Machine-Level Programming V Buffer Overflows And Attacks Worms, Viruses And ROP Lecture Notes

# 4.0.4 Assignments

The assignment for this week is:

- Bomb Lab (2/20/24)
- Bomb Lab Extra Credit (2/20/24)
- Bomb Lab Interview (2/20/24)

# 4.0.5 Quiz

The quizzes for this week are:

- Chapter 3.6 Quiz Chapter 3.6 Finalized Quiz (2/13/24)
- Chapter 3.7 Quiz Chapter 3.7 Finalized Quiz (2/13/24)

#### 4.0.6 Exam

The exam for this week is:

- Practice Exam 1
- Exam 1 Exam 1 Data Representation (2/7/24)

# 4.0.7 Chapter Summary

The chapter that is being covered this week is **Chapter 3: Machine-Level Representation of Programs**. The first section that is being covered from this chapter this week is **Section 3.6: Control**.

# Section 3.6: Control

#### **Condition Codes**

Condition codes, also known as flags, are special purpose registers used by the CPU to store the results of operations, particularly to indicate the status of arithmetic and logical operations. These flags are set or cleared automatically by the CPU following the execution of an instruction, and they can be tested through various conditional jump instructions to make decisions in a program. Understanding condition codes is crucial for writing efficient low-level code, as they directly influence the flow of control in programs.

# **Key Concepts**

- Zero Flag (ZF)
  - Indicates whether an operation resulted in a zero value. It is set if the result of an operation is zero.
  - Example: If an addition operation results in 0, the ZF is set.
- Sign Flag (SF)
  - Reflects the sign of the result of the last operation. It is set if the result is negative.
  - Example: If a subtraction operation results in a negative value, the SF is set.
- Carry Flag (CF)
  - Used to indicate an overflow in unsigned arithmetic operations. It is set if the operation causes a carry out of the most significant bit.

 Example: In an addition of two large unsigned numbers that results in a carry, the CF is set.

# • Overflow Flag (OF)

- Indicates an overflow condition for signed arithmetic operations. It is set if the operation produces a result too large for the destination to hold.
- Example: If the result of signed addition exceeds the range representable by the operand size, the OF is set.

# • Parity Flag (PF)

- Indicates whether the number of set bits in the result is odd or even. It is set if the number
  of set bits is even.
- Example: After an operation, if the result has an even number of 1 bits, the PF is set.

# Utilizing Condition Codes in Control Flow

Condition codes are extensively used to control the flow of a program based on the outcomes of operations. For instance, conditional jump instructions can be used to execute different sections of code depending on the status of these flags:

```
CMP EAX, EBX; Compare EAX and EBX.

JE equal_label; Jump to equal_label if EAX is equal to EBX (ZF is set).

JG greater_label; Jump to greater_label if EAX > EBX (SF=OF and ZF is clear).

JL less_label; Jump to less_label if EAX < EBX (SF!=OF).
```

In this example, the CMP instruction sets condition codes based on the comparison of EAX and EBX. The subsequent instructions test these flags to determine the program's flow, demonstrating how condition codes are used to implement decision-making in assembly language.

# Accessing the Condition Codes

Accessing the condition codes, also known as flags, is a fundamental aspect of controlling program flow in assembly language. These codes are not directly accessible in the same way as general-purpose registers, but their effects can be observed and utilized through specific instructions designed for conditional execution. The ability to test these flags allows for the implementation of conditional branching, loops, and other control structures based on the outcomes of previous operations.

# **Key Concepts**

#### • SET Instructions

- Instructions like SETZ, SETNZ, SETG, and SETL set a byte to 1 or 0 based on the condition codes. These instructions allow the condition codes to influence the values in registers or memory.
- Example: SETZ AL // Sets the AL register to 1 if ZF is set, indicating the previous operation resulted in zero.

# • Conditional Jump Instructions

- Instructions such as JE, JNE, JG, and JL allow the program to jump to different sections of code based on the status of condition codes.
- Example: JE target\_label // Jumps to target\_label if ZF is set.

# • Conditional Move Instructions

- Instructions like CMOVZ, CMOVNZ, CMOVG, and CMOVL conditionally move data between registers based on the condition codes without changing the flow of the program.
- Example: CMOVZ EAX, EBX // Moves the value from EBX to EAX if ZF is set.

# • Loop Instructions

- Instructions like LOOP, LOOPE, and LOOPNE decrement the counter and jump to a label if the counter is not zero and optionally if a condition code is in a specific state.
- Example: LOOPE loop\_start // Loops to loop\_start as long as ZF is set and the counter is not zero.

# Practical Example of Accessing Condition Codes

The following example illustrates how condition codes can be accessed and used for implementing a simple conditional operation in assembly:

```
CMP EAX, EBX ; Compare EAX to EBX.

JNE not_equal ; Jump if not equal (ZF is clear).

MOV ECX, 1 ; If equal, set ECX to 1.

JMP end ; Jump to the end of the condition.

not_equal:

MOV ECX, 0 ; If not equal, set ECX to 0.

end:

Continue execution...
```

This code segment demonstrates the use of CMP to set condition codes based on the comparison of two registers, followed by JNE to branch the program flow based on the Zero Flag. This is a fundamental example of accessing and utilizing condition codes to influence program behavior.

#### Jump Instructions

Jump instructions are a critical component of assembly language programming, enabling the implementation of control flow mechanisms such as loops, conditional execution, and function calls. These instructions alter the normal sequential execution of instructions by transferring control to another part of the program based on either unconditional or conditional logic. Understanding jump instructions is essential for creating dynamic and efficient programs in assembly language.

# **Key Concepts**

# • Unconditional Jump

- The JMP instruction is used for unconditional jumps, where control is transferred to the specified location regardless of any condition codes.
- Example: JMP target\_address // Jumps to the instruction located at target\_address unconditionally.

# • Conditional Jump

- Conditional jump instructions, such as JE, JNE, JG, and JL, transfer control based on the state of specific condition codes set by previous operations.
- Example: JE equal\_label // Jumps to equal\_label if the Zero Flag (ZF) is set, indicating the last comparison was equal.

# • Short and Near Jumps

- Short jumps are limited to a small range, typically within -128 to +127 bytes from the current instruction, whereas near jumps can cover a larger range within the current code segment.

- Example: JMP SHORT near\_label // Performs a short jump to a nearby label.

#### Far Jump

- Far jumps allow jumping to an instruction in a different code segment, specifying both the segment and offset.
- Example: JMP FAR segment:offset // Jumps to a specified segment and offset, allowing for transitions between different segments of memory.

# • Loop Instructions

- Loop instructions like LOOP, LOOPE, and LOOPNE combine the functionality of decrementing a counter and conditional jumping to facilitate the creation of loops.
- Example: LOOP loop\_start // Decrements the counter and jumps to loop\_start if the counter is not zero.

# Example of Using Jump Instructions

Here is a practical demonstration of using jump instructions to control program flow:

```
start_loop:

CMP EAX, 10 ; Compare EAX with 10.

JGE end_loop ; If EAX is greater or equal to 10, jump out of the loop.

INC EAX ; Increment EAX.

JMP start_loop ; Unconditionally jump back to the start of the loop.

end_loop:

; Continue with the rest of the program...
```

This example illustrates a simple loop that increments the EAX register until its value reaches 10. The CMP instruction sets condition codes based on the comparison, JGE conditionally breaks out of the loop if EAX is 10 or more, and JMP unconditionally continues the loop until the condition is met.

# Jump Instruction Encodings

Jump instruction encodings refer to the specific binary representation of jump instructions in assembly language programming. These encodings determine how jump instructions are interpreted by the CPU, specifying the type of jump (conditional or unconditional), the destination address, and other attributes necessary for execution. Understanding jump instruction encodings is essential for programmers who need to manage control flow precisely, optimize code, or engage in low-level debugging.

#### **Key Concepts**

# Opcode

- The opcode (operation code) is the part of the instruction that specifies the operation to be performed. For jump instructions, the opcode indicates whether the jump is conditional or unconditional.
- Example: The opcode for JMP (unconditional jump) is different from that of JE (jump if equal).

# • Operand Encoding

- The operands for jump instructions typically include the destination address. The encoding of this address can vary depending on the instruction's format—short, near, or far.
- Example: Short jumps encode the destination as a single byte offset relative to the next instruction, while near jumps use a word or dword for the offset.

# • Instruction Length

- The length of a jump instruction in bytes can vary based on the type of jump and the encoding of the destination address. Short jumps have shorter instruction lengths than near or far jumps.
- Example: A short jump instruction might be 2 bytes long, while a near jump could be 3 or more bytes, depending on the address size.

# • Relative vs. Absolute Addressing

- Jump instructions can use relative addressing, where the destination is given as an offset from the current instruction, or absolute addressing, specifying the exact memory address to jump to.
- Example: JMP instructions often use relative addressing to facilitate more efficient and relocatable code.

# • Conditional Jump Encoding

- Conditional jumps use specific opcodes to test condition codes (flags) and determine whether to take the jump. The encoding includes information on which flag to test.
- Example: The encoding for JE includes the opcode for the jump and the condition code test for zero (ZF set).

# Decoding a Jump Instruction

To illustrate how jump instruction encodings work, consider the encoding process for a simple conditional jump:

```
0x74 0x05 ; Opcode for JE (0x74) followed by a one-byte relative offset (0x05). ^{2}
```

This encoding represents a JE instruction that causes a jump to a location 5 bytes ahead if the Zero Flag (ZF) is set. The first byte (0x74) identifies the instruction as a JE jump, and the second byte specifies the jump distance. Understanding the encoding allows programmers to predict the size and behavior of their code at the machine level, optimizing for performance and space.

# Implementing Conditional Branches with Conditional Control

Implementing conditional branches with conditional control involves the use of conditional jump instructions and condition codes to direct the flow of execution based on the outcome of specific operations. This mechanism is fundamental to programming in assembly language, enabling complex decision-making processes, loops, and branching structures. Conditional control makes it possible to execute different code paths in response to runtime conditions, enhancing the versatility and functionality of programs.

# **Key Concepts**

# • Conditional Jump Instructions

- Conditional jump instructions, such as JE, JNE, JG, and JL, are used to implement conditional branches. These instructions test the condition codes set by previous operations and jump to specified locations in the code if the conditions are met.
- Example: JG greater\_label // Jumps to greater\_label if the last comparison indicated that the first operand is greater than the second.

#### • Setting Condition Codes

- Operations like CMP (compare) and TEST are used to set condition codes without modifying the operands. These instructions prepare the conditions for subsequent conditional jumps.
- Example: CMP EAX, EBX // Compares EAX and EBX and sets condition codes accordingly.

# • Combining Instructions for Branching

- A combination of comparison and jump instructions is used to create conditional branches.
   The comparison sets the condition codes, and the jump instruction uses these codes to decide whether to branch.
- Example: Using CMP followed by JE to branch if two values are equal.

# • Loop Control

- Conditional control is also crucial for implementing loops, where the loop continues as long
  as a condition is true. Instructions like LOOP, LOOPE, and LOOPNE are used in conjunction
  with conditional jumps to manage loop execution.
- Example: LOOPNE loop\_start // Continues looping to loop\_start as long as the Zero Flag is not set and the counter is not zero.

#### • Function Calls and Returns

- Conditional control can also influence function call execution, where conditions may determine whether a function is called or which function to call.
- Example: Conditional jumps can be used to call different functions based on runtime conditions.

# Example of Conditional Branching

The following example demonstrates implementing a conditional branch to execute different code based on the comparison of two values:

```
CMP EAX, EBX ; Compare EAX to EBX.

JE equal_path ; Jump to equal_path if EAX is equal to EBX.

JG greater_path ; Jump to greater_path if EAX is greater than EBX.

Continue with the default path if none of the above conditions are met.

equal_path:

Code to execute if EAX equals EBX.

JMP end_of_branch

greater_path:

; Code to execute if EAX is greater than EBX.

end_of_branch:

; Continue execution with the rest of the program.
```

This code segment uses CMP to compare EAX and EBX, followed by conditional jumps (JE and JG) to branch the execution based on the comparison results. This illustrates how conditional control enables dynamic decision-making in assembly programs.

# Implementing Conditional Branches with Conditional Moves

Implementing conditional branches with conditional moves involves using conditional move instructions to select between different values or actions based on the outcome of previous operations, without changing the program's flow with jumps. Conditional move instructions, such as CMOVZ (move if zero) and CMOVNZ (move if not zero), provide a way to make decisions in a program based on the condition codes set by earlier instructions. This approach can lead to more efficient execution on modern processors by avoiding the penalties associated with branch misprediction.

# **Key Concepts**

#### • Conditional Move Instructions

- Conditional move instructions perform data movement based on the state of specific condition codes. Unlike conditional jumps, these instructions do not alter the flow of execution but instead select between different operands.
- Example: CMOVZ EAX, EBX // Moves the value from EBX to EAX if the Zero Flag (ZF) is set.

# • Advantages Over Conditional Jumps

- Conditional moves can reduce the number of branches and thus minimize the impact of branch misprediction, leading to potentially more predictable and faster execution in certain scenarios.
- Example: Using CMOVZ to conditionally assign a value without the need for a branch.

# • Applicability and Limitations

- While conditional moves offer benefits in reducing branch mispredictions, they are not universally applicable. They are best used when the decision involves simple value assignments and does not require complex computations or effects that depend on branching.
- Example: CMOVZ is useful for simple value selections but cannot replace conditional jumps for complex branching logic.

# • Combining with Other Instructions

- Conditional move instructions can be effectively combined with arithmetic, logical, and comparison instructions to implement efficient conditional operations without branching.
- Example: A CMP instruction followed by a CMOVZ to conditionally move a value based on the comparison result.

#### • Performance Considerations

- The use of conditional moves should be balanced with performance considerations, as their benefit over conditional jumps depends on the specific circumstances, such as the processor's branch prediction capabilities and the nature of the decision logic.
- Example: In tight loops or highly predictable branches, the advantage of conditional moves might be less significant.

# Example of Using Conditional Moves

The following example illustrates the use of conditional moves to implement decision logic without branching:

```
CMP EAX, EBX ; Compare EAX to EBX.
CMOVZ ECX, EDX ; If EAX equals EBX (ZF is set), move EDX to ECX.
CMOVNZ ECX, ESI ; If EAX does not equal EBX (ZF is clear), move ESI to ECX.
Continue with the rest of the program, using ECX as the conditionally assigned value.
```

This code segment demonstrates how conditional moves can be used to select between EDX and ESI for assignment to ECX based on the result of a comparison between EAX and EBX. This approach avoids the need for conditional branching, potentially improving performance by reducing the impact of branch misprediction.

# Loops

Loops in assembly language are constructs that allow the execution of a sequence of instructions repeatedly as long as a given condition is satisfied. They are fundamental for tasks that require iteration, such as processing arrays, performing calculations a certain number of times, or waiting for an event. Assembly language provides several instructions specifically designed for looping, enabling efficient and precise control over the iteration process.

# **Key Concepts**

# • LOOP Instruction

- The LOOP instruction decrements the counter (traditionally stored in the ECX register) and jumps to a specified label if the counter has not reached zero. It combines the functionality of decrementing a counter and conditional branching into a single operation.
- Example: LOOP loop\_start // Decrements ECX and jumps back to loop\_start if ECX is not zero.

# • Conditional Loop Instructions

- Instructions such as LOOPE (loop while equal) and LOOPNE (loop while not equal) add a
  condition check to the basic loop functionality, allowing for more complex looping behavior
  based on the Zero Flag (ZF).
- Example: LOOPE loop\_start // Continues the loop as long as ZF is set and ECX is not zero.

# • Jump Instructions for Looping

- In addition to the specific loop instructions, general jump instructions like JMP, JE, JNE, etc., can be used to implement custom loop constructs, providing greater flexibility at the cost of potentially more complex code.
- Example: Using a combination of CMP and JNE to implement a loop that terminates based on a condition other than a counter reaching zero.

# • Implementing For Loops

- Assembly language does not have a direct equivalent to the high-level for loop construct, but a similar loop can be implemented using a combination of initialization, condition checking, and increment/decrement instructions.
- Example: Initializing a counter, using CMP and JLE for condition checking, and INC or DEC for incrementing or decrementing, respectively, to simulate a for loop.

#### • While Loops

- Similarly, while loops can be implemented using condition checks at the start or end of the loop body, using jump instructions to continue or exit the loop based on the condition.
- Example: Using TEST and JNZ at the start of the loop to implement a while loop that continues as long as a condition is true.

# Example of Implementing a Loop

The following code illustrates a simple loop that increments a value stored in EAX ten times:

```
1 MOV ECX, 10 ; Initialize the loop counter to 10.
2 loop_start:
3 DEC ECX ; Decrement the loop counter.
4 JZ loop_end ; Exit the loop if the counter has reached zero.
5 INC EAX ; Increment the value of EAX.
6 JMP loop_start ; Jump back to the start of the loop.
7 loop_end:
8 ; Continue with the rest of the program...
```

This example demonstrates the use of a decrement and jump approach to implement a loop, highlighting the manual control over looping behavior that assembly language affords. The JZ instruction is used to exit the loop when the counter reaches zero, while INC is used within the loop body to perform the desired operation.

#### Switch Statements

Switch statements in high-level languages like C or Java provide a way to execute different blocks of code based on the value of a variable. While assembly language does not have a direct equivalent of the switch statement, similar functionality can be implemented using a combination of comparison, jump, and table-driven techniques. This approach allows for efficient selection among multiple code paths based on the value of a given operand.

# **Key Concepts**

# Implementing with Comparison and Jump Instructions

- A simple form of a switch statement can be implemented by sequentially comparing the variable against the case values and using conditional jump instructions to branch to the corresponding code block for each case.
- Example: Using a series of CMP and JE instructions to implement a switch-case structure.

# • Jump Table Technique

- For switches with many cases or when efficiency is a concern, a jump table (also known as a branch table) can be used. This technique involves creating an array of addresses, each pointing to the code block for a case. The value of the variable is used as an index into the table to directly jump to the corresponding code block.
- Example: Using the JMP instruction in conjunction with an indexed array of labels to implement a jump table.

# • Calculating Table Offsets

- When using a jump table, the offset within the table is calculated based on the case value. This often requires adjusting the case value to ensure it matches the array index, especially if the case values do not start at zero or have gaps.
- Example: Adjusting the case value by subtracting the value of the first case to align it with the start of the jump table.

# • Default Case Handling

- Implementing a default case, which is executed if none of the case values match, can be done by adding a final jump at the end of the comparisons or jump table to branch to the default code block.  Example: Placing a JMP instruction to the default case label after all other case checks or at the end of the jump table.

# • Optimization Considerations

- The choice between a series of comparisons and a jump table depends on factors like the number of cases, the range of case values, and performance considerations. Jump tables offer faster execution time at the cost of additional memory, while comparison chains are simpler but potentially slower for large switch statements.
- Example: Evaluating the trade-offs between using sequential comparisons versus a jump table based on the specific requirements of the application.

# Example of Implementing a Switch Statement

The following code illustrates implementing a switch statement using a jump table:

```
; Assume EAX holds the case value
 Jump table addresses are stored in a data segment
JMP [JumpTable + EAX*4] ; Multiply case value by 4 (size of address) and jump
    ; Code blocks for each case
    ; Code for case 1
   JMP end_switch
case2:
    ; Code for case 2
    JMP end_switch
default_case:
    ; Code for default case
end switch:
; Continue with the rest of the program...
; JumpTable defined in the data segment
JumpTable:
   DD case1
   DD case2
    ; Addresses for other cases
   DD default_case ; Default case address
```

This example demonstrates the use of a jump table to efficiently implement a switch statement in assembly language, allowing for direct jumping to the code block corresponding to the value in EAX. The jump table provides a fast and scalable way to handle multiple cases.

The last section that will be covered from this chapter this week is **Section 3.7: Procedures**.

#### Section 3.7: Procedures

#### The Run-Time Stack

The run-time stack is a crucial concept in understanding how procedures (functions) are executed in computer programming, particularly in assembly language. It is a structured area of memory that supports the dynamic control flow of programs, allowing for function calls, parameter passing, local variable storage, and the handling of return addresses. The stack operates on a last-in, first-out (LIFO) principle, meaning that the last item pushed onto the stack is the first item to be popped off.

# **Key Concepts**

#### • Stack Frame

- Each function call creates a new stack frame or activation record on the stack. This frame

typically includes the function's return address, parameters, and local variables. The stack frame provides the necessary context for executing a function and returning control to the caller.

# • Push and Pop Operations

— The PUSH and POP instructions are used to add and remove data from the stack, respectively. These operations adjust the stack pointer (SP or ESP in x86 architecture), which points to the top of the stack.

#### • Function Call and Return

The call to a function involves pushing the return address onto the stack (done automatically by the CALL instruction) and then jumping to the function's code. The function's return involves popping the return address off the stack (done automatically by the RET instruction) and jumping back to that address.

# • Stack Pointer Management

 Proper management of the stack pointer is essential for maintaining the integrity of the run-time stack, especially during function calls and returns. The stack pointer must be adjusted appropriately to allocate and deallocate space for local variables and parameters.

#### • Base Pointer

- The base pointer (BP or EBP in x86 architecture) is used within functions to reference parameters and local variables consistently. It typically points to a fixed location within the stack frame for the duration of the function's execution.

# • Stack Overflow and Underflow

- Stack overflow occurs when the stack exceeds its allocated space, often due to excessive or infinite recursion. Stack underflow happens when attempting to pop more items from the stack than it contains, which can lead to program errors or crashes.

The run-time stack is a fundamental component of procedural programming, enabling the nested calling of functions, parameter passing, and local variable management in a controlled and structured manner. Understanding its operation and management is key to effective programming in assembly language and understanding the underlying mechanisms of higher-level languages.

# Control Transfer

Control transfer in assembly language refers to changing the normal sequential execution flow of a program. This is achieved through various instructions that allow for conditional and unconditional jumps, function calls, and returns. Control transfer mechanisms are essential for implementing decision-making, looping, and procedural abstraction in assembly language programs. They enable the execution of different code paths based on runtime conditions, the organization of code into reusable procedures, and the creation of complex, structured programs.

# **Key Concepts**

#### • Unconditional Jump

 Unconditional jump instructions (JMP) transfer control to a specified address or label without evaluating any condition. They are used to implement goto-like behavior, loop constructs, and for exiting from conditional constructs.

# Conditional Jump

Conditional jump instructions, such as JE (jump if equal), JNE (jump if not equal), and others, transfer control based on the outcome of a prior comparison or arithmetic operation.
 They are fundamental for decision-making structures like if-else and switch-case.

#### • Call and Return Instructions

The CALL instruction transfers control to a procedure or function, pushing the return address onto the stack. The RET instruction returns control to the calling procedure by popping the return address off the stack. These instructions are crucial for procedural abstraction and recursion.

# • Loop Instructions

 Loop instructions, such as LOOP, LOOPE, and LOOPNE, combine a decrement operation on a counter with a conditional jump, facilitating the implementation of loops based on a condition.

# • Interrupts and System Calls

 Interrupts and system call instructions (INT) transfer control to an interrupt handler or operating system function. These mechanisms allow for interaction with hardware devices and the execution of system-level operations.

# • Indirect Jump and Call

 Indirect jump and call instructions transfer control to an address specified in a register or memory location. This allows for dynamic determination of the target address, useful in implementing function pointers, virtual method calls, and similar constructs.

Control transfer instructions are the backbone of assembly language programming, providing the flexibility to implement a wide range of programming constructs and control flow patterns. Understanding and effectively using these instructions is crucial for developing efficient and maintainable assembly language programs.

#### Data Transfer

Data transfer in assembly language involves moving data between registers, memory locations, and I/O devices. These operations are fundamental for any assembly language program, as they enable the manipulation and storage of data during execution. Data transfer instructions vary in complexity and specificity, from simple moves between registers to more complex operations involving addressing modes and immediate values.

# **Key Concepts**

#### • MOV Instruction

- The MOV instruction is the most basic form of data transfer, used to move data from one location to another. It can transfer data between registers, from memory to a register, or from a register to memory. The source data remains unchanged.

#### • Addressing Modes

 Addressing modes define how the address of the data to be accessed is calculated. Common addressing modes include immediate (direct value), register (using a register's value as an address), direct (using a specific memory address), and indirect (using an address held in a register or memory).

# • PUSH and POP Instructions

PUSH and POP instructions are used for stack operations. PUSH decrements the stack pointer
and places data on the top of the stack, while POP removes data from the top of the stack
and increments the stack pointer.

#### • Load and Store Instructions

 Load instructions move data from memory into a register, while store instructions move data from a register to memory. These operations are crucial for interacting with data stored in memory.

# • Input/Output Instructions

I/O instructions transfer data between the CPU and peripheral devices. Instructions like
 IN and OUT are used for reading from and writing to I/O ports, respectively.

# • String Instructions

 String instructions perform operations on sequences of bytes or words. Instructions like MOVS, LODS, and STOS are used for moving, loading, and storing string data efficiently.

# • Exchange Instructions

 The XCHG instruction swaps the contents of two registers or a register and a memory location. This can be useful for rearranging data without requiring a temporary storage location.

Data transfer instructions form the backbone of assembly language programs, enabling the manipulation of data across the system's various components. Mastery of these instructions and the underlying principles of data movement is essential for effective programming in assembly language.

# Local Storage on the Stack

Local storage on the stack refers to the temporary storage of data within a function's stack frame during its execution. This technique is widely used for allocating space for local variables, parameters, and for preserving the values of registers that are used within the function. The stack's last-in, first-out (LIFO) nature makes it an ideal structure for managing data in a nested or recursive function call scenario.

# **Key Concepts**

# • Allocating Space for Local Variables

Space for local variables can be allocated on the stack by adjusting the stack pointer (SP or ESP in x86 architecture). This is typically done by subtracting the size of the required storage from the stack pointer at the beginning of a function.

# • Saving and Restoring Registers

Registers that are used within a function and need to be preserved are saved on the stack
at the function's start and restored before the function returns. This ensures that the
function does not disrupt the calling context.

# • Accessing Local Variables and Parameters

 Once space has been allocated on the stack, local variables and parameters can be accessed using the base pointer (BP or EBP in x86 architecture) with appropriate offsets.

#### • Cleaning Up the Stack

- Before returning from a function, any space allocated on the stack for local variables must be deallocated, typically by adding back the subtracted space to the stack pointer. This cleans up the function's stack frame and prepares the stack for the next function call.

# Example of Using Local Storage on the Stack

The following assembly code snippet demonstrates allocating space for local variables, saving a register value, and then cleaning up the stack before returning from a function:

```
; Save the old base pointer value
                        ; Set the new base pointer to the current stack pointer
mov ebp, esp
sub esp,
                          Allocate 8 bytes of space on the stack for local variables
                        ; Save the value of eax register (if it needs to be
push eax
preserved)
; The function's code, using the allocated local storage
 For example, accessing the first local variable:
mov [ebp-4], 1234
                        ; Store 1234 in the first local variable (4 bytes from BP)
                        ; Restore the value of eax register
                        ; Deallocate local variables by resetting the stack pointer
mov esp, ebp
pop ebp
                          Restore the old base pointer value
                        ; Return to the calling function
```

This example highlights the typical use of the stack for managing local data within a function, including allocating space for local variables, preserving and restoring register values, and ensuring proper cleanup of the stack before the function exits.

# Local Storage in Registers

Local storage in registers is a technique used in assembly language programming to utilize the CPU's registers for storing temporary data during a function's execution. This approach is favored for its speed, as accessing data in registers is significantly faster than accessing data in memory. However, the limited number of registers available means that this technique is best used for the most frequently accessed data or for data that requires rapid manipulation.

#### **Key Concepts**

# • Register Selection

- Choosing which registers to use for local storage is critical. General-purpose registers (like EAX, EBX, ECX, EDX in x86 architecture) are commonly used for this purpose, but the specific choice depends on the function's needs and the calling convention being used, which dictates registers that need to be preserved across function calls.

# • Preserving Register Values

- When using registers for local storage, it's essential to save the original values if they are used by the caller. This is usually done by pushing the registers' contents onto the stack at the beginning of the function and popping them back before returning.

# • Efficiency Considerations

 Utilizing registers for local data storage can significantly increase the efficiency of a function by reducing memory access times. However, this must be balanced with the need to preserve the context for the calling function, especially in applications with deep call stacks or limited register availability.

# • Usage Patterns

 Registers are ideally used for variables that are frequently accessed or modified, such as loop counters, temporary calculations, or as pointers to data structures in memory.

# Example of Using Local Storage in Registers

The following assembly code snippet demonstrates using registers for local storage within a function, including the preservation and restoration of the register values:

```
push ebx ; Preserve the value of ebx
mov eax, [some_value] ; Use eax for a temporary calculation
add eax, 10 ; Perform an operation using eax
mov ebx, eax ; Use ebx as local storage for the result of the calculation
; Perform more operations using ebx as needed
pop ebx ; Restore the original value of ebx before returning
ret ; Return to the calling function
```

In this example, EAX is used for a temporary calculation, and EBX is used for storing the result of this calculation for further use within the function. The value of EBX is preserved at the start of the function and restored before returning, ensuring that the function's use of the register does not affect the calling context.

#### Recursive Procedures

Recursive procedures are functions that call themselves, either directly or indirectly, allowing for the solution of problems by breaking them down into smaller, similar problems. This programming technique is common in many high-level languages and can also be implemented in assembly language. Recursive procedures in assembly require careful management of the run-time stack to handle the function's local variables, parameters, and return addresses for each recursive call.

# **Key Concepts**

# • Base Case and Recursive Case

A recursive procedure must have at least one base case, which stops the recursion by not
making further recursive calls, and one or more recursive cases that include the function
calling itself with modified parameters.

# • Stack Usage

- Each recursive call uses stack space to store its parameters, local variables, and return address. This means that the depth of recursion is limited by the size of the stack. Deep recursion can lead to a stack overflow.

# • Preserving Register Values

Registers used in a recursive procedure may need to be preserved across recursive calls, especially if they hold data that is used after a recursive call returns. This is typically done by pushing the registers onto the stack at the beginning of the procedure and popping them before returning.

# • Parameter Passing

 Parameters to recursive calls can be passed using the stack or registers, depending on the number of parameters and the calling convention used. Parameters passed on the stack are automatically available to the called procedure in its new stack frame.

#### • Handling Recursion Termination

 Care must be taken to ensure that the recursion terminates properly by meeting a base case condition. Failure to do so can lead to infinite recursion and a stack overflow.

# Example of a Recursive Procedure

The following assembly code snippet demonstrates a simple recursive procedure that calculates the factorial of a number:

```
factorial:
        push ebp
                                     ; Save the old base pointer
        mov ebp, esp
                                     ; Set the new base pointer
        sub esp, 4
                                     ; Allocate space for local variables if needed
        mov eax, [ebp+8]
                                     ; Move the argument (n) into eax
        cmp eax, 1
                                     ; Compare n to 1 (base case)
                                     ; If n \le 1, jump to the end
        jle end_factorial
                                     ; Decrement n
        dec eax
                                     ; Push n-1 onto the stack
10
        push eax
        call factorial
                                     ; Recursive call to factorial(n-1)
        mov ebx, [ebp+8]
                                     ; Load n again into ebx
                                     ; Multiply the result by n
13
        imul eax, ebx
    end_factorial:
        mov esp, ebp
                                     ; Clean up the stack
        pop ebp
                                     ; Restore the old base pointer
        ret
                                     ; Return with the result in eax
```

This example calculates the factorial of a number by recursively calling itself with the decremented value until it reaches the base case of 1. Each recursive call is prepared by pushing the next value of n onto the stack and using the IMUL instruction to multiply the return value of the recursive call by n. Stack cleanup and register preservation are handled carefully to maintain the correct state across calls.



# Week 5: (2/12 - 2/18)



# Machine-Level Representation of Programs

# 5.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 3.8 Array Allocation & Access
- Computer Systems: Chapter 3.9 Heterogeneous Data Structures
- Computer Systems: Chapter 3.10 Combining Control Data

#### 5.0.2 Piazza

Piazza discussions are optional this week. (2/20/24)

#### 5.0.3 Lectures

The lecture videos for this week are:

- Data Arrays  $\approx 21$  min.
- Heterogenous Data Structures  $\approx 25$  min.
- Data Pointers  $\approx 20$  min.
- Attacks Buffer Overflow  $\approx 20$  min.
- Detailed Walkthrough Of Last Question In 3.10 Quiz  $\approx 11$  min.

The lecture notes for this week are:

- Machine-Level Programming V Buffer Overflows And Attacks Lecture Notes
- Machine-Level Programming V Buffer Overflows And Attacks Worms, Viruses And ROP Lecture Notes
- Program Optimization Lecture Notes

# 5.0.4 Assignments

The assignment for this week is:

- Attack Lab (3/12/24)
- Attack Lab Interview (3/12/24)

# 5.0.5 Quiz

The quizzes for this week are:

- Chapter 3.8 3.9 Quiz Chapter 3.8 3.9 Finalized Quiz (2/20/24)
- Chapter 3.9 3.10 Quiz Chapter 3.9 3.10 Finalized Quiz (2/20/24)

# 5.0.6 Chapter Summary

The first section from Chapter 3: Machine-Level Representation Of Programs that is being covered this week is Section 3.8: Array Allocation And Access.

# Section 3.8: Array Allocation And Access

# Array Allocation and Access

Array allocation and access at the machine level involve managing continuous blocks of memory to store sequences of elements and implementing mechanisms to read from and write to these elements efficiently. In assembly language, arrays can be statically allocated at compile time or dynamically allocated at runtime, and accessing array elements requires calculating the memory address of each element based on its index.

# **Key Concepts**

# • Static Array Allocation

- Static arrays are allocated in the data segment of a program with a fixed size determined at compile time. The assembler reserves the necessary amount of memory space based on the array's declared size and element type.
- Example: An array of integers might be declared and initialized with specific values or reserved with a predefined size using directives like DB, DW, or DD in x86 assembly.

#### • Dynamic Array Allocation

- Dynamic arrays are allocated at runtime, typically using system calls or library functions to request memory from the heap. The size of the array can be determined during the execution of the program, allowing for more flexible data structures.
- Example: Using the malloc function in a C program, which can be called from assembly code to allocate memory for an array dynamically.

#### • Accessing Array Elements

- Elements within an array are accessed by calculating their memory addresses. This
  calculation involves the base address of the array, the size of each element, and the index
  of the element to be accessed.
- Example: To access the i-th element of an array starting at base address ebx with 4-byte integers, the address of the element is ebx + i\*4.

# • Indexing and Address Calculation

- Assembly languages often provide instructions that facilitate efficient calculation of element addresses using indexing and scaling. These instructions can automatically multiply the index by the size of the array elements to compute the offset from the base address.

# • Looping Through Arrays

- Iterating over array elements typically involves looping constructs that increment an index variable, calculate the address of the current element, and then perform some operation on the element. This process repeats until the end of the array is reached.

# Example of Static Array Allocation and Access

The following assembly code snippet demonstrates declaring a static array and accessing its elements:

```
section .data
array DW 1, 2, 3, 4, 5 ; Declare a static array of 5 integers
global start
_start:
    mov ebx, array
                         ; Load the base address of the array into ebx
                          ; Initialize the index to 0 (first element)
    mov ecx. 0
loop start:
                          ; Compare the index with the array size
    cmp ecx, 5
                         ; If index >= 5, exit the loop
    jge end_loop
    mov ax, [ebx + ecx*2]; Load the value of the array at index ecx into ax
    ; Do something with {\tt ax}
                         ; Increment the index
    inc ecx
                         ; Jump back to the start of the loop
    jmp loop_start
end loop:
    ; Continue with the rest of the program...
```

This example demonstrates allocating a static array of integers and iterating over its elements using a loop. The address of each element is calculated by adding the scaled index (ecx\*2, because each integer is 2 bytes with the DW directive) to the base address of the array (ebx).

The next section that is being covered from this chapter this week is **Section 3.9: Heterogeneous Data Structures**.

# Section 3.9: Heterogeneous Data Structures

# Heterogeneous Data Structures

Heterogeneous data structures in the context of machine-level representation of programs refer to structures that can store elements of different types and sizes. Unlike arrays, which are homogeneous and store elements of the same type, heterogeneous data structures, such as structs in C or records in other languages, allow for the combination of various data types in a single, unified structure. Implementing and accessing these structures in assembly language requires careful management of memory layout and understanding how to calculate offsets for each element.

# **Key Concepts**

# • Structure Definition

Structures are defined by specifying the type and order of their elements. In assembly language, this is often represented by allocating a block of memory where each element is placed at a specific offset from the beginning of the block.

# • Memory Layout and Alignment

The memory layout of a structure is crucial for correctly accessing its elements. Elements are typically aligned in memory according to their size to facilitate efficient access, which may introduce padding bytes between elements to meet alignment requirements.

#### • Accessing Elements

- Accessing elements within a heterogeneous structure involves calculating the address of

the element, which is the base address of the structure plus the offset of the element. The offset must account for the sizes and alignment of all preceding elements.

# • Static and Dynamic Allocation

- Like arrays, structures can be statically allocated in the data segment or dynamically allocated at runtime on the heap. The allocation method affects how the base address of the structure is obtained and managed.

#### • Nested Structures

 Structures can contain other structures as elements, creating nested or complex data structures. Accessing elements in nested structures requires calculating offsets within offsets.

# Example of Defining and Accessing a Heterogeneous Structure

The following assembly code snippet demonstrates defining a simple structure with heterogeneous types and accessing its elements:

```
section .data
; Define a structure with an integer, a character, and a pointer
struct:
    int_val DD 12345
                                 ; 4 bytes for the integer
    char_val DB 'A'
                                 ; 1 byte for the character
    ptr_val DD ptr_to_some_data
                                 ; 4 bytes for the pointer
section .text
global _start
_start:
   mov ebx, struct
                         ; Load the base address of the structure into ebx
    ; Access the integer value
                        ; Move the integer value at the start of the structure into
    mov eax, [ebx]
    ; Access the character value
                        ; Move the character value at offset 4 into al
    mov al. [ebx + 4]
    ; Access the pointer value
   mov eax, [ebx + 5] ; Move the pointer value at offset 5 into eax (assuming no
padding)
    ; Continue with the rest of the program...
```

This example defines a structure containing an integer, a character, and a pointer. It demonstrates how to calculate the offsets for each element based on their sizes and order in the structure. Note that alignment and padding issues are not addressed in this simplified example but would need to be considered in a real-world scenario to ensure correct access to each element.

The last section that will be covered from this chapter this week is **Section 3.10: Combining Control And Data In Machine-Level Programs**.

# Section 3.10: Combining Control And Data In Machine-Level Programs

# Combining Control and Data in Machine-Level Programs

Combining control and data in machine-level programs refers to the techniques and practices used to manage both the flow of execution and the manipulation of data efficiently within the same program. This integration is crucial for creating complex, high-performance applications that can handle a variety of tasks, from simple calculations to managing complex data structures and algorithms.

Effective combination of control and data involves understanding how the CPU executes instructions, how data is stored and accessed in memory, and how these elements can be orchestrated to achieve the desired program behavior.

# **Key Concepts**

# • Instruction Set Architecture (ISA)

- The ISA defines the set of instructions and capabilities of a CPU, including how data operations (like arithmetic and logical operations) and control operations (like jumps and calls) are supported. Understanding the ISA is fundamental to effectively combining control and data.

#### • Conditional Execution

 Combining control flow with data involves the use of conditional instructions that alter the program's execution path based on the results of data comparisons or the state of specific flags in the CPU's status register.

# • Procedural Abstraction

Procedures or functions encapsulate specific operations on data, allowing for the reuse
of code and more organized program structures. The call and return mechanisms are
examples of control flow operations that interact closely with data.

# • Loop Unrolling and Optimization

- Loop unrolling is an optimization technique that reduces the control overhead of loops by executing multiple iterations of the loop body within a single loop iteration. This combines control and data manipulation by reducing the number of jump and comparison instructions needed.

# • Inline Assembly

In higher-level languages, inline assembly allows developers to embed assembly language instructions directly within a high-level language program. This provides fine-grained control over both the data and control flow aspects of a program, allowing for optimizations and functionality that are not possible or efficient in the high-level language alone.

#### • Data-Driven Control Flow

- Techniques like function pointers, jump tables, and virtual method tables in object-oriented programming are examples of data-driven control flow, where the data (e.g., the address of a function) determines the flow of execution.

# Example of Combining Control and Data

A practical example of combining control and data in a machine-level program is the implementation of a simple switch-case statement using a jump table, which is a data structure that maps case labels to their corresponding addresses in memory. This allows for efficient, data-driven decision-making:

```
; Assume EAX contains the case value
; JumpTable is an array of addresses to the case handlers
mov ebx, [JumpTable + eax*4]; Calculate the address of the handler
jmp ebx; Jump to the handler

; Handlers for each case follow
case1_handler:
; Handle case 1
jmp end_switch
case2_handler:
; Handle case 2
; etc.

Handle case 2
; etc.

continue execution after the switch
```

This example demonstrates how control flow (the jump to a case handler) is directly influenced by data (the case value in EAX and the jump table addresses). This technique combines control and data to implement a common high-level programming construct at the machine level, showcasing the efficiency and flexibility of machine-level programming.



# Week 6: (2/19 - 2/25)



# Processor Architecture And Optimizing Program Performance

# 6.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 4.3 Sequential Processors
- Computer Systems: Chapter 4.4 Pipelined Processor
- Computer Systems: Chapter 5.1 Capabilities & Limitations of Compilers
- Computer Systems: Chapter 5.2 Expressing Program Performance
- Computer Systems: Chapter 5.3 Program Example
- Computer Systems: Chapter 5.4 Eliminating Loop Inefficiencies
- Computer Systems: Chapter 5.5 Reducing Procedure Calls
- Computer Systems: Chapter 5.6 Eliminating Unnecessary Memory References

#### 6.0.2 Piazza

Piazza discussions are optional this week. (2/27/24)

#### 6.0.3 Lectures

The lecture videos for this week are:

- Attacks Worms, Viruses & ROP  $\approx 25$  min.
- Understanding Processor Performance Sequential Processors  $\approx 16$  min.
- Understanding Processor Performance Pipelines  $\approx 13$  min.
- Optimization Generally Useful Optimizations  $\approx 32$  min.
- Attack Lab Orientation Video  $\approx 36$  min.

The lecture notes for this week are:

- Program Optimization Lecture Notes
- Program Optimization ILP Lecture Notes
- Processors And Pipelines I Lecture Notes
- Processors And Pipelines II Lecture Notes
- Processors And Pipelines III Lecture Notes

# 6.0.4 Assignments

The assignment for this week is:

- Attack Lab (3/12/24)
- Attack Lab Interview (3/12/24)

# 6.0.5 Quiz

The quizzes for this week are:

• Chapter 5.1-5.6 Quiz • Chapter 5.1-5.6 Finalized Quiz (2/27/24)

# 6.0.6 Chapter Summary

The first chapter that is being covered this week is **Chapter 4: Processor Architecture**. The first section that is being covered from this chapter this week is **Section 4.3: Sequential Y86-84 Implementations**.

# Section 4.3: Sequential Y86-84 Implementations

# Sequential Y86-84 Implementations

Sequential Y86-84 implementations refer to a simplified instruction set architecture (ISA) based on the x86 architecture, designed primarily for educational purposes to teach the fundamentals of processor architecture and sequential execution. The Y86-84 ISA simplifies many aspects of the x86 architecture, allowing students and beginners to grasp the basic concepts of processor design, instruction execution, and control flow without the complexity of a full-featured x86 processor. Sequential implementations execute instructions one at a time, in the order they appear in the program, without any parallel execution or pipelining.

# **Key Concepts**

# • Instruction Set

The Y86-84 ISA includes a subset of the instructions found in the x86 architecture, such as move (MOV), arithmetic operations (ADD, SUB), and control flow instructions (JMP, JE, JNE).
 These instructions are simplified to focus on the fundamental operations of the processor.

# • Registers

 Similar to x86, the Y86-84 architecture includes a set of general-purpose registers used for arithmetic operations, addressing, and temporary data storage. However, the number of registers is typically reduced to simplify the architecture.

#### • Sequential Execution

- In a sequential Y86-84 implementation, instructions are executed one after another, with each instruction completing before the next one begins. This contrasts with pipelined or parallel architectures, where multiple instructions may be in different stages of execution simultaneously.

# • Memory Access

 Memory operations in the Y86-84 architecture are simplified to basic load and store instructions, allowing for straightforward implementation of memory access without the complexities of caching or advanced memory management techniques.

# • Control Flow

Control flow instructions alter the sequential execution order, enabling the implementation
of loops, conditional execution, and function calls. These instructions are critical for
creating dynamic and functional programs within the Y86-84 architecture.

# • Implementation and Simulation

The simplicity of the Y86-84 ISA makes it suitable for implementation and simulation in educational settings. Students can build and observe the inner workings of a sequential processor, including the fetch-decode-execute cycle, register operations, and memory access.

# Example of Sequential Y86-84 Implementation

While specific code examples are beyond the scope of this summary, a typical exercise in implementing a sequential Y86-84 processor might involve simulating the fetch-decode-execute cycle:

- 1. **Fetch**: The processor retrieves the next instruction from memory, based on the current value of the program counter (PC).
- 2. **Decode**: The instruction is decoded to determine the operation to be performed and the operands involved.
- 3. **Execute**: The processor executes the instruction, which may involve arithmetic operations, updating registers, or modifying the PC to change the flow of execution.

This simplified model allows students to understand the core aspects of processor operation, instruction execution, and the impact of different instructions on the state of the processor and memory.

The next section that is going to be covered from this chapter this week is **Section 4.4: General Principles Of Pipelining**.

# Section 4.4: General Principles Of Pipelining

# General Principles of Pipelining

Pipelining is a fundamental concept in computer architecture that enhances the processing speed of a CPU by dividing the execution process into multiple stages and executing multiple instructions simultaneously, each at a different stage. This approach allows for increased instruction throughput—the number of instructions that can be completed in a unit of time—by taking advantage of parallelism within the execution process.

# **Key Concepts**

# • Pipeline Stages

- A typical instruction pipeline includes several stages, such as instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and write-back (WB). Each stage performs a part of the instruction's execution, and multiple instructions can be in different stages of execution simultaneously.

# • Instruction Throughput

 Pipelining improves instruction throughput by allowing a new instruction to enter the pipeline at each clock cycle, ideally completing an instruction at every cycle once the pipeline is full.

#### Hazards

Pipelining introduces the possibility of hazards, which are situations that prevent the
next instruction in the pipeline from executing in the following cycle. Hazards include
data hazards, control hazards, and structural hazards, each requiring specific strategies to
mitigate.

#### • Data Hazards

— Data hazards occur when instructions that are close together in the pipeline need to access the same data but do so in an order that could produce incorrect results. Techniques like forwarding or stalling the pipeline are used to handle data hazards.

# • Control Hazards

 Control hazards arise from the execution of control flow instructions, such as jumps and branches, which can change the instruction sequence. Predicting the outcome of branches and pre-fetching instructions based on these predictions can mitigate control hazards.

# • Structural Hazards

 Structural hazards occur when multiple instructions in the pipeline require the same hardware resource simultaneously. These can be mitigated by duplicating resources or scheduling instructions to avoid conflicts.

# • Pipeline Depth and Performance

The depth of a pipeline, or the number of stages, affects its performance. A deeper pipeline can offer higher theoretical throughput but may suffer from increased latency per instruction and greater vulnerability to hazards.

# Example of Pipelining Impact

Consider a simplified CPU with a 5-stage pipeline (IF, ID, EX, MEM, WB). If each stage takes one clock cycle to complete, and there are no hazards or stalls, the CPU can theoretically achieve a throughput of one instruction per cycle, significantly enhancing performance over a non-pipelined processor where each instruction would need to complete before the next begins.

However, the presence of a branch instruction that alters the control flow could introduce a control hazard, potentially stalling the pipeline until the branch's target is known. Implementing branch prediction in this scenario could help maintain high throughput by guessing the branch's outcome and continuing instruction fetch based on that guess, with mechanisms to correct mispredictions.

The next chapter that is being covered this week is **Chapter 5: Optimizing Program Performance**. The first section that is being covered from this chapter this week is **Section 5.1: Capabilities And Limitations Of Optimizing Compilers**.

# Section 5.1: Capabilities And Limitations Of Optimizing Compilers

# Capabilities and Limitations of Optimizing Compilers

Optimizing compilers are advanced tools designed to improve the efficiency and performance of programs by transforming code in ways that reduce execution time, minimize memory usage, and enhance overall execution speed without altering the program's output or behavior. While these compilers have significantly advanced, offering a wide range of optimizations, there are inherent

capabilities and limitations in their approach to optimizing program performance.

# Capabilities of Optimizing Compilers

# • Code Optimization Techniques

Optimizing compilers employ various techniques such as loop unrolling, constant folding, dead code elimination, and function inlining to enhance performance. These optimizations are applied during the compilation process and are based on a deep understanding of the target architecture and execution model.

# • Automatic Parallelization

 Some compilers can automatically parallelize code, transforming sequential operations into parallel counterparts that can be executed simultaneously on multiple cores or processors, thus reducing execution time.

# • Hardware-Specific Optimizations

 Compilers often include optimizations tailored to specific hardware architectures, utilizing special instruction sets and hardware capabilities (such as vector operations) to speed up execution.

# • Profile-Guided Optimization (PGO)

- PGO involves compiling a program multiple times, using execution profiles from earlier runs to inform optimization decisions in subsequent compilations. This can lead to more effective optimizations by focusing on the most frequently executed paths.

# Limitations of Optimizing Compilers

#### Understanding High-Level Intent

- Compilers may not fully grasp the high-level intent or semantics behind a piece of code, limiting their ability to apply more aggressive or transformative optimizations that require an understanding of the program's purpose or expected behavior.

# • Aliasing and Pointer Analysis

 Compilers can struggle with optimizing code that makes extensive use of pointers and memory aliasing, as the potential for different pointers to refer to the same memory location complicates analysis and limits optimization opportunities.

#### Dynamic Behavior and Runtime Conditions

The dynamic nature of certain programs, especially those that rely heavily on runtime data and conditions, poses challenges for compilers that must make optimization decisions at compile time without full knowledge of runtime behavior.

# • Trade-offs and Heuristics

Optimizing compilers often rely on heuristics to make decisions about which optimizations to apply. These heuristics can sometimes lead to suboptimal choices due to the complexity of accurately predicting the impact of an optimization on diverse hardware and software environments.

# Example of Compiler Optimization

An example of compiler optimization could involve loop unrolling, where the compiler transforms a loop to execute multiple iterations within a single pass to reduce the overhead of loop control instructions. For instance, a loop that increments a counter could be unrolled to increment the counter by a larger step each time, reducing the total number of iterations and, consequently, the loop overhead.

However, if the loop accesses an array through a pointer, and there's uncertainty about what other pointers might reference the same array (aliasing), the compiler may be unable to apply certain optimizations safely, highlighting the balance between aggressive optimization and the need to maintain correct program behavior.

The next section that is being covered from this chapter this week is **Section 5.2: Expressing Program Performance**.

# Section 5.2: Expressing Program Performance

# **Expressing Program Performance**

Expressing program performance is a crucial aspect of computer science, especially when optimizing program code or comparing the efficiency of different algorithms and systems. Performance can be measured and expressed in various ways, each highlighting different aspects of a program's execution characteristics. Understanding these metrics is essential for developers aiming to optimize applications and for researchers conducting performance evaluations.

# **Key Concepts**

#### • Execution Time

The most direct measure of program performance is execution time, also known as running time or wall-clock time. It measures how long a program takes to complete its execution from start to finish. Execution time can be affected by a wide range of factors, including processor speed, memory access times, and I/O operations.

#### • Throughput and Latency

Throughput refers to the amount of work done per unit of time, such as tasks per second, while latency measures the time it takes for a single task to complete. These metrics are particularly relevant in the context of servers and real-time systems.

#### Clock Cycles

- Program performance can also be expressed in terms of clock cycles, which measure the number of processor cycles required to execute a program. This metric is closely tied to the CPU's clock speed and the efficiency of the program's instruction sequence.

#### • Instructions Per Cycle (IPC)

 IPC is a measure of a processor's efficiency, indicating how many instructions can be executed per clock cycle. Higher IPC values indicate better utilization of the CPU's resources.

#### CPU Utilization and Efficiency

These metrics evaluate how effectively a program uses the CPU's computational resources.
 Efficient programs maximize CPU utilization while minimizing idle time and unnecessary operations.

# Benchmarking

 Benchmark tests run standardized tasks or programs to evaluate the performance of hardware or software systems. Benchmarks provide a comparative measure of performance across different systems or configurations.

# Profiling

 Profiling tools analyze a program's execution to identify where it spends most of its time or consumes the most resources. This information is critical for targeted optimizations.

# Example of Expressing Program Performance

Consider a program that processes a set of data. One way to express its performance is by measuring the total execution time, such as seconds or milliseconds. For a more detailed analysis, we could look at the number of instructions executed and the IPC rate to understand how efficiently the program uses the CPU.

Additionally, using profiling tools, we might discover that a significant portion of the execution time is spent in a specific function or waiting for I/O operations. This insight allows us to express performance in terms of specific bottlenecks, guiding optimization efforts more effectively.

In a comparative context, benchmarking could be used to measure the program's performance across different hardware setups, expressing results in terms of throughput (e.g., data items processed per second) or scalability (how performance changes with increasing data size or processing cores).

The next section that is being covered from this chapter this week is **Section 5.3: Program Example**.

# Section 5.3: Program Example

# **Program Example**

In the context of optimizing program performance, examining a specific program example can illustrate how performance analysis and optimization strategies are applied in practice. This approach helps in understanding the impact of different coding patterns, algorithm choices, and compiler optimizations on the execution speed and efficiency of a program. Through a detailed examination of a program example, we can explore the concepts of benchmarking, profiling, and the application of optimization techniques to enhance program performance.

# **Key Concepts**

# · Benchmarking and Profiling

Benchmarking involves running a set of standardized tests to measure the performance
of a program, while profiling is the process of analyzing a program to determine which
parts consume the most time or resources. Both are crucial for identifying performance
bottlenecks and opportunities for optimization.

# • Optimization Techniques

 Techniques such as loop unrolling, function inlining, and data structure optimization can significantly impact program performance. The choice of algorithm can also drastically affect execution time and resource usage.

#### • Compiler Optimizations

 Understanding the capabilities of the compiler to apply automatic optimizations, and how to enable or guide these optimizations through code annotations or compiler flags, is essential for achieving optimal performance.

#### • Parallelization

- For programs that can be executed in parallel, dividing the workload across multiple processors or cores can lead to significant performance improvements. Identifying parallelizable components and effectively implementing parallelism are key challenges.

# Program Example: Matrix Multiplication

Consider a program that performs matrix multiplication, a common operation in many scientific and engineering applications. The naive implementation involves three nested loops to compute the product of two matrices. This example can serve as a basis for exploring several performance-related concepts:

- **Benchmarking**: Measuring the execution time of the matrix multiplication program with different matrix sizes provides a baseline for performance.
- **Profiling**: Identifying the innermost loop as the critical path where the majority of execution time is spent.
- Optimization Techniques: Applying loop unrolling to the innermost loop to reduce loop overhead, reordering loops to improve cache locality, or utilizing more efficient algorithms like Strassen's algorithm for large matrices.
- Optimizations: Investigating the impact of compiler optimization levels (-O2, -O3) and specific flags that enable vectorization or other architectural optimizations.
- Parallelization: Implementing a parallel version of the matrix multiplication using threads or SIMD instructions to utilize multiple cores or vector units available in modern CPUs.

Through this example, one can illustrate the process of analyzing and optimizing a program from initial development through to achieving optimized performance, highlighting the practical application of concepts discussed in the context of expressing program performance.

The next section that is being covered from this chapter this week is **Section 5.4: Eliminating Loop Inefficiencies**.

# Section 5.4: Eliminating Loop Inefficiencies

# **Eliminating Loop Inefficiencies**

Eliminating loop inefficiencies is a critical aspect of optimizing program performance. Loops are fundamental constructs in programming, but they can also be sources of significant performance bottlenecks, especially in compute-intensive applications. By addressing inefficiencies within loops, programmers can greatly enhance the execution speed and efficiency of their applications.

# Key Strategies for Eliminating Loop Inefficiencies

# • Loop Unrolling

- Loop unrolling involves replicating the body of the loop multiple times within a single iteration, reducing the overhead associated with the loop control mechanism (incrementing

the index, evaluating the termination condition). This can lead to faster execution but at the cost of increased code size.

# • Loop Fusion

- Loop fusion combines the bodies of two or more loops that iterate over the same range into a single loop, reducing loop overhead and potentially improving cache locality by accessing data in a more sequential manner.

# • Loop Fission or Distribution

- Loop fission, or distribution, separates a loop into multiple loops over the same range but with each new loop performing a part of the work of the original loop. This can improve cache performance and allow for parallel execution of the loops.

# • Removing Loop Invariants

 Loop invariant code motion optimizes performance by moving code that does not change across iterations of the loop (invariant code) outside the loop. This reduces the amount of work done within the loop.

# • Induction Variable Simplification

- This optimization technique simplifies the arithmetic of induction variables, potentially reducing the complexity of loop index calculations and making the loops run faster.

# • Software Pipelining

Software pipelining reorders operations across loop iterations to create a steady state where
each iteration's operations are partially overlapped with operations from other iterations,
similar to hardware instruction pipelining.

# Minimizing Data Dependencies

 Reducing data dependencies within loops can enable more aggressive optimizations by the compiler, including parallelization of loop iterations.

# Example of Eliminating Loop Inefficiencies

Consider a simple loop that processes an array:

```
for (int i = 0; i < N; ++i) {
    array[i] = array[i] * 2 + 1;
}
</pre>
```

# **Optimization 1: Loop Unrolling**

Manually unrolling the loop can decrease the number of iterations and reduce the loop overhead:

```
for (int i = 0; i < N; i += 2) {
    array[i] = array[i] * 2 + 1;
    if (i + 1 < N) {
        array[i + 1] = array[i + 1] * 2 + 1;
    }
}</pre>
```

# Optimization 2: Removing Loop Invariants

If the loop contained invariant calculations or conditions, moving them outside the loop would reduce the computation performed in each iteration.

These examples illustrate how addressing loop inefficiencies can significantly impact program performance, especially in loops that execute a large number of times or operate on large data sets.

The next section that is going to covered from this chapter this week is **Section 5.5: Reducing Procedure Calls**.

# Section 5.5: Reducing Procedure Calls

# Reducing Procedure Calls

Reducing procedure calls is a critical strategy in optimizing program performance. Procedure calls, while essential for modular programming and code reuse, introduce overhead due to the need to save and restore state, pass parameters, and potentially disrupt the instruction pipeline. Minimizing the frequency or cost of these calls can significantly improve execution speed, especially in performance-critical sections of code.

#### Strategies for Reducing Procedure Calls

# • Inlining Functions

Function inlining replaces a call to a function with the function's body itself. This eliminates
the overhead associated with the call but at the cost of increasing code size. Compilers
often automatically inline small functions.

# • Loop Unrolling and Procedure Calls

- When loop unrolling, consider the impact of procedure calls within loops. If a procedure is called within a loop, unrolling might increase the number of calls. In such cases, inlining the procedure or moving the call outside the loop, if possible, can be beneficial.

#### • Avoiding Recursion

- Recursive procedures can be highly inefficient due to repeated procedure calls. Where possible, converting recursion to iteration can reduce these calls and improve performance.

## • Batching Work

- Grouping work into larger batches to be processed in a single procedure call, rather than making multiple calls with smaller amounts of work, can reduce overhead. This is particularly effective for operations like I/O processing or network communication.

## • Using Macros or Templates

- In some languages, macros or templates can replace function calls. These are expanded at compile time, similar to inlining, but without requiring the compiler's discretion. This approach should be used judiciously to avoid code bloat.

## • Reevaluating Algorithm Design

- In some cases, the need for frequent procedure calls indicates an inefficiency in the algorithm's design. Reevaluating the algorithm to reduce the dependency on procedure calls can lead to performance improvements.

# Example of Reducing Procedure Calls

Consider a scenario where a function is called within a loop to process elements of an array:

```
for (int i = 0; i < N; ++i) {
    processElement(array[i]);
    }
}</pre>
```

## Optimization: Inlining 'processElement'

If 'processElement' is a small function, inlining it either manually or relying on the compiler's optimization can eliminate the call overhead:

```
for (int i = 0; i < N; ++i) {
    // Inline body of processElement
    array[i] = array[i] * someCalculation();
4 }
5</pre>
```

### Optimization: Batching Work

If 'processElement' involves significant overhead (e.g., sending data over a network), consider restructuring the code to batch work:

```
processElements(array, N); // Single call processes all elements
```

These examples highlight how reducing procedure calls can be achieved through various optimization strategies, leading to significant performance improvements in certain contexts.

The last section that is being covered from this chapter this week is **Section 5.6: Eliminating Unneeded Memory References**.

# Section 5.6: Eliminating Unneeded Memory References

## Eliminating Unneeded Memory References

Eliminating unneeded memory references is a crucial optimization technique in improving program performance. Memory operations, especially reads and writes to main memory, are significantly slower than operations on CPU registers. Reducing the number of these memory references can decrease the execution time of a program by minimizing costly access delays and making better use of the CPU's internal resources.

## Strategies for Eliminating Unneeded Memory References

## • Register Allocation

- Allocating frequently accessed variables to registers reduces the need for memory accesses. Compilers typically perform register allocation but may be guided by hints or explicit requests in high-level languages (e.g., the 'register' keyword in C).

## • Common Subexpression Elimination

This optimization identifies expressions that are computed more than once with the same values and eliminates the redundancy by storing the result in a register and reusing it, thereby reducing memory reads.

## • Loop Invariant Code Motion

 Moving calculations that do not change across iterations of a loop outside the loop can reduce the number of memory accesses by computing the value once and storing it in a register for reuse.

## • Strength Reduction

- Replacing expensive operations with cheaper ones (e.g., replacing multiplication with addition) can sometimes reduce the need for memory references, as the cheaper operations may be more amenable to register-only computations.

#### • Minimizing Pointer Dereferencing

- Pointer dereferencing often involves memory accesses. Keeping dereferenced values in registers when possible, especially within loops, can minimize costly memory accesses.

#### Array Access Optimizations

 Access patterns in arrays can be optimized to improve cache utilization, reducing the need for main memory references. Techniques include loop interchange, blocking, and padding to avoid cache line conflicts.

# Example of Eliminating Unneeded Memory References

Consider a loop that calculates the sum of an array:

```
int sum = 0;
for (int i = 0; i < N; ++i) {
    sum += array[i];
}</pre>
```

## Optimization: Register Allocation

Ensuring that 'sum' and 'i' are kept in registers during the loop execution minimizes memory accesses. A compiler is likely to make this optimization automatically in such a simple case.

## Optimization: Loop Invariant Code Motion

If the loop contained invariant computations, such as calculating an offset based on 'i' that does not change within the loop, moving those calculations outside the loop would reduce memory references.

These examples demonstrate how reducing unneeded memory references can significantly impact the performance of critical code sections, especially in loops or frequently called functions.



# Week 7: (2/26 - 3/3)



# Optimizing Program Performance

# 7.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 5.7 Understanding Modern Processors
- Computer Systems: Chapter 5.8 Loop Unrolling
- Computer Systems: Chapter 5.9 Enhancing Parallelism
- Computer Systems: Chapter 5.10 Summary of Optimizations
- Computer Systems: Chapter 5.11 Some Limiting Factors
- Computer Systems: Chapter 5.12 Understanding Memory Performance
- Computer Systems: Chapter 5.13 Life In The Real World
- Computer Systems: Chapter 5.14 Identifying and Eliminating Performance Bottlenecks

#### 7.0.2 Piazza

Piazza discussions are optional this week. (3/5/24)

### 7.0.3 Lectures

The lecture videos for this week are:

- Optimization Exploiting Instruction Level Parallelism  $\approx 45$  min.
- Understanding Processor Performance Branches And Vectors  $\approx 21$  min.
- Optimization Optimization Blockers  $\approx 18$  min.

The lecture notes for this week are:

- Memory Hierarchy I Lecture Notes
- Memory Hierarchy II Lecture Notes

# 7.0.4 Assignments

The assignment for this week is:

- Attack Lab (3/12/24)
- Attack Lab Interview (3/12/24)

## 7.0.5 Quiz

The quizzes for this week are:

• Chapter 4 & 5 Quiz • Chapter 4 & 5 Finalized Quiz (3/5/24)

#### 7.0.6 Exam

The exam for this week is:

- Practice Exam 2
- Exam 2 Exam 2 Machine Level Representation (2/28/24)

# 7.0.7 Chapter Summary

The chapter that we are covering this week is **Chapter 5: Optimizing Program Performance**. The first section that we will be covering from this chapter is **Section 5.7: Understanding Modern Processors**.

## Section 5.7: Understanding Modern Processors

## **Understanding Modern Processors**

Understanding modern processors is essential for optimizing program performance. Modern CPUs are complex systems designed to maximize execution speed through various architectural features such as pipelining, superscalar execution, out-of-order execution, and sophisticated memory hierarchies. Optimizing software requires knowledge of these features and how they affect program execution.

### **Key Features of Modern Processors**

## • Pipelining

Pipelining divides instruction execution into discrete stages, allowing multiple instructions
to be processed simultaneously but at different stages. This increases instruction throughput
but introduces challenges such as handling data and control hazards.

### • Superscalar Architecture

 Superscalar processors can issue multiple instructions per clock cycle from a single thread, increasing the utilization of available resources and improving performance.

#### • Out-of-Order Execution

To maximize resource utilization and minimize idle cycles, modern processors can execute instructions out of the order they appear in the program, as long as data dependencies are respected. This requires sophisticated tracking of instruction dependencies and results.

### • Speculative Execution

 Processors may execute instructions ahead of their actual place in the execution sequence, guessing the outcome of branches (branch prediction) to fill the pipeline optimally. Incorrect guesses require rolling back speculative execution, which can impact performance.

### • Memory Hierarchy

– Modern CPUs use a hierarchy of memory storage (registers, cache levels, main memory) to balance the trade-off between access speed and storage capacity. Understanding and optimizing for cache usage can significantly affect performance.

## • Multithreading and Parallel Execution

Features like hardware threads (Hyper-Threading in Intel processors) and multicore architectures allow for parallel execution of threads or processes, offering substantial performance improvements for parallelizable workloads.

## Implications for Program Optimization

## • Algorithm and Data Structure Choice

 Selecting algorithms and data structures that make efficient use of the memory hierarchy and that are amenable to parallel execution can lead to significant performance gains.

## • Minimizing Cache Misses

 Organizing data access patterns to maximize cache hits (temporal and spatial locality) reduces slow memory access, improving performance.

## • Reducing Branch Penalties

 Writing code to minimize the cost of branch mispredictions, such as by avoiding unnecessary branches or optimizing branch predictability, can enhance execution speed.

## • Exploiting Instruction-Level Parallelism

Coding practices that allow the compiler or processor to identify and exploit parallelism
within a single thread can improve performance without requiring explicit parallel programming.

# • Parallel Programming

 For applications that can be divided into concurrent tasks, using parallel programming models (e.g., OpenMP, MPI) to distribute work across multiple cores or processors can achieve substantial performance improvements.

### Considerations for Modern Processors

When optimizing for modern processors, it's crucial to profile and understand the application's behavior, identify bottlenecks, and apply targeted optimizations that leverage the processor's features. Tools like profilers and performance counters can provide insights into how effectively a program is utilizing the CPU, guiding optimization efforts for maximum performance gain.

The next section that will be covered from this chapter this week is **Section 5.8: Loop Unrolling**.

### Section 5.8: Loop Unrolling

## Loop Unrolling

Loop unrolling is an optimization technique that aims to improve the execution speed of a program by reducing the overhead associated with looping constructs. It involves duplicating the loop body multiple times within a single iteration, thereby decreasing the total number of iterations along with the checks and updates to the loop counter and termination condition. This method can lead to performance enhancements, especially in loops that perform minimal work per iteration, by minimizing loop overhead and increasing the workload per iteration.

## **Key Points of Loop Unrolling**

- Reduced Loop Overhead: By decreasing the frequency of branch instructions (loop condition checks), loop unrolling can potentially improve the efficiency of the instruction pipeline.
- Increased Instruction-Level Parallelism: Executing more operations per iteration creates additional opportunities for the compiler or processor to optimize the execution of independent instructions concurrently.
- Improved Cache Utilization: Unrolling loops can alter the pattern of memory accesses, potentially enhancing cache performance for certain data access patterns.
- Trade-offs and Limitations: Although loop unrolling can boost performance, it also increases the code size, which may result in cache misses if the unrolled loop body becomes too large. The effectiveness of loop unrolling performed by compilers varies, and manual unrolling requires careful consideration of the loop's characteristics and the target architecture.

## Example of Loop Unrolling

### **Rolled Version**

Consider a simple loop that increments each element of an array:

```
1  for (int i = 0; i < N; i++) {
2     array[i] += 1;
3  }
4</pre>
```

### **Unrolled Version**

To unroll this loop by a factor of 4, manually increase the number of operations per iteration, effectively reducing the total number of iterations:

```
for (int i = 0; i < N; i += 4) {
        array[i] += 1;
        if (i + 1 < N) array[i + 1] += 1;
        if (i + 2 < N) array[i + 2] += 1;
        if (i + 3 < N) array[i + 3] += 1;
}</pre>
```

In this unrolled version, four array elements are processed in each iteration, reducing the number of loop condition checks and increments of the loop counter. The if statements ensure that the loop handles cases where N is not a multiple of 4, preventing out-of-bounds access.

#### Considerations

- The effectiveness of loop unrolling depends on the specific characteristics of the loop and the hardware it runs on. It is particularly beneficial for loops with a high iteration count and simple loop bodies.
- Excessive unrolling can lead to an increase in code size, which might negatively impact performance by causing more cache misses. Finding a balance or relying on compiler optimizations is crucial.
- Modern compilers can perform loop unrolling based on heuristics. However, manual unrolling
  might still offer benefits in performance-critical sections, especially when the programmer has
  insights that the compiler cannot automatically deduce.

When optimizing loops through unrolling, it's essential to consider the impact on code size and cache behavior. Profiling and experimenting with different unrolling factors can help find the optimal balance for a given loop and target hardware. Manual loop unrolling should be applied judiciously, as overly aggressive unrolling may lead to diminished returns or even performance degradation due to increased code size and cache pressure.

The next section that is being covered from this chapter this week is **Section 5.9: Enhancing Parallelism**.

## Section 5.9: Enhancing Parallelism

## **Enhancing Parallelism**

Enhancing parallelism is a key strategy in optimizing program performance, leveraging the capability of modern processors to execute multiple operations simultaneously. Parallelism can be exploited at various levels, from instruction-level parallelism within a single processor core, to data-level and task-level parallelism across multiple cores or processors. By identifying and exploiting parallelism in an application, developers can significantly improve execution speed and efficiency.

## Key Strategies for Enhancing Parallelism

## • Instruction-Level Parallelism (ILP)

- ILP refers to the ability of a processor to execute multiple instructions simultaneously. Techniques such as pipelining, superscalar execution, and out-of-order execution are used to exploit ILP. Optimizing code to increase the independence of adjacent instructions can enhance ILP.

# • Data-Level Parallelism (DLP)

- DLP exploits the parallelism inherent in operations that can be performed on different pieces of data simultaneously. SIMD (Single Instruction, Multiple Data) instructions, available on many modern processors, are a key tool for exploiting DLP.

#### • Task-Level Parallelism (TLP)

- TLP involves executing different computational tasks in parallel, utilizing multiple processing units. Techniques for exploiting TLP include multithreading and multiprocessing, which can be implemented using various parallel programming models and languages.

## • Loop Parallelism

Many applications contain loops that iterate over data structures or perform computations
that are independent of each other. Identifying and restructuring loops to enable parallel
execution of iterations can significantly improve performance.

## • Parallel Algorithms and Data Structures

Choosing algorithms and data structures that are designed for parallel execution can greatly
enhance parallelism. This might involve using concurrent data structures or designing
algorithms that minimize dependencies between computational tasks.

#### Considerations for Enhancing Parallelism

#### Overhead and Scalability

 Introducing parallelism can incur overhead, such as communication or synchronization costs between threads or processes. It's essential to balance the benefits of parallel execution against these overheads to achieve optimal performance.

## • Data Dependencies

 Data dependencies can limit the ability to parallelize code. Analyzing and restructuring code to reduce or eliminate these dependencies can unlock greater parallelism.

### • Hardware Considerations

The specific characteristics of the hardware, such as the number of cores, the presence of SIMD instructions, and the memory hierarchy, can significantly impact the effectiveness of parallelism enhancements.

## Enhancing Parallelism in Practice

A common example of enhancing parallelism is the parallelization of loops in numerical computations. By identifying loops where iterations do not depend on each other, developers can use parallel programming constructs (e.g., OpenMP in C/C++) to distribute the iterations across multiple threads or cores, achieving significant performance gains on multicore processors.

The next section that is being covered from this chapter this week is **Section 5.10: Summary of Results for Optimizing Combining Code**.

## Section 5.10: Summary of Results for Optimizing Combining Code

## Summary of Results for Optimizing Combining Code

Optimizing combining code focuses on techniques that enhance the execution efficiency of programs by merging operations, minimizing redundant calculations, and exploiting the compiler's ability to generate optimized code. This process often involves rethinking how data is accessed, computed, and stored, aiming to reduce the overall computational workload and memory usage. Through careful analysis and restructuring of code, significant improvements can be achieved in both performance and resource utilization.

## Key Findings in Optimizing Combining Code

## • Minimizing Redundant Operations

Identifying and eliminating redundant operations that perform the same computation
multiple times can significantly reduce the execution time. This often involves caching the
results of expensive operations for reuse.

## • Exploiting Compiler Optimizations

- Writing code in a manner that enables the compiler to apply advanced optimizations effectively can lead to better performance. This includes adhering to best practices that enhance the predictability and analyzability of code.

#### • Consolidating Data Accesses

 Restructuring code to consolidate data accesses and minimize memory bandwidth usage can improve cache efficiency and reduce memory latency impacts.

#### Fusing Loops and Operations

 Combining multiple loops into a single loop and fusing operations when possible reduces loop overhead and can improve data locality, enhancing performance.

# • Vectorization and Parallel Execution

- Leveraging vector instructions (SIMD) and parallel execution models can significantly speed up operations that are amenable to data-level parallelism.

## Challenges in Optimizing Combining Code

## • Maintaining Code Clarity and Maintainability

 Balancing optimization efforts with the need to maintain clear and maintainable code is a key challenge. Over-optimization can lead to complex code that is difficult to understand and maintain.

# • Understanding Compiler Behavior

 Predicting how a compiler will optimize certain code patterns can be challenging, requiring in-depth knowledge of the compiler and the target architecture.

## • Platform-Specific Optimizations

- Optimizations that yield significant improvements on one hardware platform may not be effective on another, making cross-platform optimization a complex task.

# Example and Impact

An example of optimizing combining code is the transformation of separate loops that iterate over the same data set into a single loop that performs multiple operations per iteration. This change can reduce loop overhead and improve cache utilization. In practice, such optimizations can lead to measurable performance gains, especially in data-intensive applications where memory bandwidth and latency significantly impact overall execution time.

Consider the following code with separate loops:

```
1  // Loop 1: Incrementing each element
2  for (int i = 0; i < N; i++) {
3     array1[i] += 1;
4  }
5
6  // Loop 2: Doubling each element
7  for (int i = 0; i < N; i++) {
8     array2[i] *= 2;
9  }
10</pre>
```

Optimized version with combined loop:

In this optimized version, both operations on two different arrays are performed within a single loop iteration. This reduces the number of loop iterations from 2N to N and decreases the loop control overhead. Furthermore, if 'array1' and 'array2' are accessed sequentially, this approach may improve cache locality and reduce cache misses, enhancing overall performance.

In summary, optimizing combining code is a multifaceted approach that requires a deep understanding of both the software being optimized and the underlying hardware. When applied judiciously, it can yield substantial improvements in program performance and efficiency.

The next section that is being covered from this chapter this week is **Section 5.11: Some Limiting Factors**.

## Section 5.11: Some Limiting Factors

## Some Limiting Factors

In the context of optimizing program performance, understanding the limiting factors that can hinder optimization efforts is crucial. These factors, often inherent to the software design, hardware architecture, or execution environment, can impose constraints on how much performance can be improved. Recognizing and addressing these limiting factors is key to effective optimization.

## **Key Limiting Factors in Program Optimization**

#### · Amdahl's Law

- Amdahl's Law highlights the limits of performance gains from parallelization. It states that the maximum improvement achievable by enhancing a particular aspect of a system is limited by the fraction of time the enhanced aspect is utilized. This principle underlines the importance of identifying and optimizing the most time-consuming parts of a program.

# • Memory Bandwidth and Latency

The speed at which data can be transferred between the CPU and memory (bandwidth) and the delay in transferring data (latency) are significant limiting factors. Even with a highly optimized CPU utilization, memory access times can bottleneck overall performance.

## • Data Dependencies

 Data dependencies within a program can restrict the ability to parallelize or reorder instructions for better optimization. True dependencies (read-after-write), anti-dependencies (write-after-read), and output dependencies (write-after-write) need to be carefully managed.

## • Branch Prediction and Misalignment

The efficiency of branch prediction mechanisms in modern processors and the potential for branch mispredictions can limit performance improvements. Code that frequently causes mispredictions can suffer from pipeline stalls and reduced execution speed.

### • Instruction-Level Parallelism (ILP) Limits

The potential for ILP is limited by factors such as the availability of independent instructions that can be executed in parallel, the number of execution units in the CPU, and the overhead of managing parallel execution.

#### • Hardware and Platform Constraints

- The specific characteristics and capabilities of the hardware platform, such as the number and type of CPU cores, presence of specialized processing units (e.g., GPUs), and the memory hierarchy, can significantly influence optimization strategies and their effectiveness.

### Considerations for Overcoming Limiting Factors

# • Comprehensive Profiling

 Profiling and analyzing application performance in detail can help identify bottlenecks and the most impactful areas for optimization, guiding efforts where they can provide the most benefit.

#### • Algorithmic and Data Structure Optimization

 Choosing or designing algorithms and data structures that are well-suited to the problem domain and hardware characteristics can mitigate some of the limiting factors, especially those related to memory access patterns and data dependencies.

## • Parallelization and Concurrency

- Exploring opportunities for parallel execution at the task, data, or instruction level can help overcome some of the inherent limitations of sequential execution.

# • Hardware-Specific Optimizations

- Tailoring optimizations to leverage the unique features and capabilities of the target hardware platform can help maximize performance gains.

# Addressing Limiting Factors

Addressing the limiting factors in program optimization requires a balanced approach that considers both software and hardware aspects. Effective optimization strategies often involve a combination of code restructuring, algorithmic changes, parallelization, and leveraging hardware features. While some limiting factors can be mitigated, others must be accepted as inherent constraints that guide optimization decisions.

The next section that is being covered from this chapter this week is **Section 5.12: Understanding Memory Performance**.

# Section 5.12: Understanding Memory Performance

## **Understanding Memory Performance**

Understanding memory performance is essential for optimizing program performance, as memory access times significantly impact the overall speed of program execution. The memory hierarchy in modern computing systems, from registers and cache to main memory and disk storage, presents a range of access speeds and storage capacities. Optimizing memory performance involves arranging data access patterns and algorithms to leverage this hierarchy effectively, minimizing the time spent waiting for data to be loaded or stored.

#### **Key Aspects of Memory Performance**

## • Memory Hierarchy

Balancing the trade-off between access speed and storage capacity, faster storage (CPU registers and cache) is limited in size, while slower storage (main memory and disks) offers greater capacity. Code optimization to prioritize data access in the faster layers can significantly boost performance.

#### • Cache Utilization

 Caches reduce access times by storing frequently accessed data and instructions close to the processor. Maximizing cache effectiveness through spatial and temporal locality can minimize slow main memory accesses.

## Data Locality

 Spatial locality (accessing closely located data elements) and temporal locality (repeated access to the same data elements) optimize memory performance.

#### • Prefetching

 Loading data into cache before it's needed can anticipate future accesses, managed by the compiler or explicitly within the application to decrease memory access delays.

## • Memory Access Patterns

 Aligning memory access patterns with cache line boundaries and avoiding cache contention enhances cache efficiency.

## Challenges in Optimizing Memory Performance

## • Predicting Cache Behavior

 Factors like cache size, associativity, and replacement policies add complexity to cache behavior prediction and optimization.

## Memory Bandwidth and Latency

 Memory accesses' bandwidth and latency can bottleneck performance, especially in dataheavy applications.

## • Concurrency and Synchronization

 Managing efficient memory access while handling concurrency and synchronization in multi-threaded applications introduces optimization complexity.

# Optimization Example

An optimization example for memory performance involves restructuring an algorithm to enhance data locality. Consider optimizing matrix multiplication by ensuring data access respects the storage order.

This adjustment aligns the inner loop with row-major order access, reducing cache misses and improving execution speed by leveraging spatial locality more effectively.

Addressing memory performance is a complex challenge that requires an in-depth understanding of both theoretical memory hierarchies and the practical implications of hardware characteristics. Careful analysis and profiling are essential to identify bottlenecks and areas for improvement.

The next section that is being covered from this chapter this week is **Section 5.13: Life in the Real World: Performance Improvement Techniques**.

## Section 5.13: Life in the Real World: Performance Improvement Techniques

## Life in the Real World: Performance Improvement Techniques

In the real world, optimizing program performance extends beyond theoretical models and lab benchmarks, involving a comprehensive understanding of both hardware and software environments. Real-world performance improvement techniques must account for the diverse and dynamic nature of computing environments, user interactions, and system loads. Practical optimization often requires a balance between maximizing efficiency and maintaining the adaptability and maintainability of code.

#### Key Techniques for Real-World Performance Improvements

## • Profiling and Benchmarking

- Using profiling tools to identify bottlenecks and benchmarking different optimization strategies under realistic workloads are essential steps in the real-world optimization process. This empirical approach helps focus efforts on the areas that will yield the most significant performance gains.

## • Algorithmic Optimization

Selecting or designing algorithms that are well-suited to the problem domain and hard-ware capabilities can lead to substantial performance improvements. This might involve leveraging data structures that minimize memory access times or algorithms that can be easily parallelized.

## • Memory Access Optimization

 Optimizing the way a program accesses memory, by enhancing data locality and reducing cache misses, is crucial for high-performance applications. Techniques such as loop tiling or blocking can make significant differences in memory-bound applications.

## · Concurrency and Parallelism

 Exploiting parallel hardware features through multithreading, multiprocessing, or vectorization can dramatically improve performance. Effective use of concurrency also requires careful attention to synchronization to avoid contention and ensure correct program execution.

## • Energy Efficiency

— In many real-world scenarios, optimizing for energy efficiency is as important as optimizing for speed, especially in mobile and embedded systems. Techniques that reduce CPU usage, minimize wake locks, and leverage low-power states can improve battery life while maintaining performance.

## Challenges in Real-World Optimization

#### Hardware and Software Diversity

Optimizing for a wide range of hardware platforms and operating systems can complicate
the optimization process, requiring adaptable and sometimes platform-specific optimization
strategies.

## · Maintainability and Readability

 Ensuring that optimized code remains maintainable and readable is crucial for long-term project sustainability. Overly aggressive optimization can lead to "spaghetti code" that is difficult to understand, debug, and extend.

#### • Diminishing Returns

The law of diminishing returns applies to performance optimization; beyond a certain point, additional efforts yield increasingly smaller gains. Prioritizing and knowing when to stop is crucial for efficient resource allocation.

# Example and Impact

An example of real-world performance improvement is optimizing a web application's response time by minimizing server-side computation and optimizing database queries. Additionally, client-side rendering can be made more efficient through code minification and image compression techniques.

## Optimized Database Query

```
-- Original query that fetches user data inefficiently

SELECT * FROM users WHERE last_login < '2021-01-01';

-- Optimized query with an indexed search

SELECT id, username, email FROM users WHERE last_login < '2021-01-01' AND active = 1;
```

This optimization reduces the amount of data transferred and processed, leveraging database indexes to speed up the query. Such optimizations can significantly improve the responsiveness of a web application, enhancing the user experience.

Real-world performance optimization is a multifaceted challenge that requires a balance between deep technical knowledge and practical considerations. Successful optimization efforts focus on measurable improvements, maintain code quality, and consider the broader implications of changes on system behavior and user experience.

The last section that is being covered from this chapter this week is **Section 5.14: Identifying and Eliminating Performance Bottlenecks**.

## Section 5.14: Identifying and Eliminating Performance Bottlenecks

## Identifying and Eliminating Performance Bottlenecks

Identifying and eliminating performance bottlenecks is a crucial step in the optimization process, allowing developers to target specific areas of an application that are limiting overall performance. A bottleneck occurs when a particular component or operation significantly slows down the execution of a program, causing the entire system to operate below its potential efficiency. Effective optimization requires a systematic approach to pinpoint these critical areas and apply targeted enhancements.

## Strategies for Identifying Performance Bottlenecks

## • Profiling

 Profiling tools analyze a program's execution to identify where the most time or resources are being consumed. This data-driven approach helps focus optimization efforts on the parts of the program that will yield the greatest performance improvements.

# • Monitoring and Logging

 Systematic monitoring and logging of application performance metrics can reveal trends and patterns that indicate potential bottlenecks, especially in complex, distributed systems.

#### • Load Testing

 Simulating high usage scenarios can help identify bottlenecks that may not be evident under normal conditions. Load testing is particularly useful for web applications and services where user demand can vary significantly.

## Code Review and Static Analysis

 Manual code review and static analysis tools can help identify common coding patterns and practices that may lead to performance issues, such as unnecessary database queries or inefficient data structures.

#### Techniques for Eliminating Performance Bottlenecks

# • Optimizing Algorithms and Data Structures

 Replacing inefficient algorithms and data structures with more efficient alternatives can have a dramatic impact on performance, especially in computationally intensive applications.

## • Improving Database Performance

 Optimizing queries, indexing data appropriately, and normalizing database structures can significantly reduce data access times and improve application responsiveness.

## • Reducing Network Latency

- Techniques such as caching, content delivery networks (CDNs), and data compression can reduce the impact of network latency in distributed applications.

## Concurrency and Parallelism

 Leveraging multi-threading and parallel processing can alleviate bottlenecks in applications by distributing workloads across multiple processors or cores.

## Resource Allocation and Management

- Optimizing the use of system resources (CPU, memory, disk I/O) can eliminate bottlenecks related to resource contention or inefficient resource usage.

### Example and Impact

Consider an application experiencing slow response times due to database queries being a significant bottleneck.

## Original Database Query

```
1 SELECT * FROM orders WHERE customer_id = 123;
2
```

#### Optimized Database Query

```
1 -- Assuming an index exists on customer_id
2 SELECT order_id, order_date FROM orders WHERE customer_id = 123;
3
```

This optimization reduces the amount of data fetched by selecting only the necessary columns and leveraging an index on the 'customer\_id' column, resulting in faster query execution and improved application responsiveness.

Identifying and eliminating performance bottlenecks is an iterative and ongoing process that requires careful analysis and strategic optimizations to enhance application performance effectively.



# Week 8: (3/4 - 3/10)



# The Memory Hierarchy

# 8.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 6.1 Storage Technology
- Computer Systems: Chapter 6.2 Locality
- Computer Systems: Chapter 6.3 The Memory Heirarchy
- Computer Systems: Chapter 6.4 Cache Memories
- Computer Systems: Chapter 6.5 Writing Cache Friendly Code
- Computer Systems: Chapter 6.6 Putting It Together

## 8.0.2 Piazza

Piazza discussions are optional this week. (3/12/24)

#### 8.0.3 Lectures

The lecture videos for this week are:

- Memory Memory Storage Technology  $\approx 37$  min.
- Memory Memory Hierarchy  $\approx 15$  min.
- Memory Introduction To Caches & Memory Trends  $\approx 28$  min.
- The Memory Hierarchy Cache Memory Organization  $\approx 34$  min.
- The Memory Hierarchy Memory Mountain & Prefetching  $\approx 10$  min.
- The Memory Hierarchy Writing Cache Friendly Code  $\approx 22$  min.

The lecture notes for this week are:

- Memory Hierarchy III Lecture Notes
- Cache Memories I Lecture Notes
- Cache Memories II Lecture Notes
- Cache Friendly Code Lecture Notes

## 8.0.4 Assignments

The assignment for this week is:

- Attack Lab (3/12/24)
- Attack Lab Interview (3/12/24)

## 8.0.5 Quiz

The quizzes for this week are:

• Chapter 6 Quiz • Chapter 6 Finalized Quiz (3/19/24)

## 8.0.6 Chapter Summary

The chapter that is being covered this week is **Chapter 6: The Memory Hierarchy**. The first section that is being covered from this chapter this week is **Section 6.1: Storage Technologies**.

# Section 6.1: Storage Technologies

## Storage Technologies

Storage technologies are essential components of the memory hierarchy, each offering a unique balance of capacity, speed, cost, and volatility. They range from fast, expensive, and volatile memory near the top of the hierarchy, such as registers and cache, to slower, cheaper, and non-volatile storage at the bottom, like hard disk drives (HDDs) and solid-state drives (SSDs).

### **Key Points of Storage Technologies**

- **Registers**: The fastest type of memory, used directly by the CPU to store the data it is currently processing. However, they have very limited capacity.
- Cache Memory: Slightly slower than registers but still very fast. Cache memory stores copies of frequently accessed data to speed up access times. It comes in multiple levels (L1, L2, L3) with increasing size and decreasing speed.
- Main Memory (RAM): Volatile memory used for storing data and programs that are in active use. It offers a good balance between speed and capacity but loses data when power is off.
- Solid-State Drives (SSD): Fast, non-volatile storage that uses flash memory. SSDs have no moving parts, which makes them faster and more reliable than HDDs but more expensive per gigabyte.
- Hard Disk Drives (HDD): Traditional storage devices with mechanical parts. They offer large storage capacities at a lower cost than SSDs but are slower and more prone to physical damage.
- Optical Drives and Media: Including CDs, DVDs, and Blu-ray discs, these are used for distributing and storing large amounts of data. They are relatively slow and have been largely superseded by flash storage for many applications.

# Example of Storage Technologies Usage

## Comparing SSD and HDD in Personal Computers

In a personal computing context, choosing between an SSD and an HDD can significantly affect the system's performance:

```
1 - Boot Time:
2 SSD: Seconds
3 HDD: Minutes
4
5 - Data Transfer Speed:
6 SSD: Up to 550 MB/s
7 HDD: 80-160 MB/s
8
9 - Noise and Vibration:
10 SSD: None
11 HDD: Noticeable
```

SSDs provide faster boot times, quicker data access, and are silent in operation, making them preferable for performance-sensitive applications. However, HDDs may still be used for bulk storage due to their cost-effectiveness.

#### Considerations

- The choice of storage technology greatly influences the overall system performance and cost. High-performance applications benefit from faster storage technologies like SSDs and cache memory.
- The evolution of storage technologies continues to address the trade-offs between speed, size, cost, and durability. Future advancements may further blur the lines between these categories.
- Environmental conditions, such as temperature and humidity, can affect the longevity and reliability of storage devices, particularly mechanical HDDs.

The next section that is being covered from this chapter this week is **Section 6.2: Locality**.

## Section 6.2: Locality

#### Locality

Locality in computer systems refers to the pattern of memory access by a processor, where certain locations or groups of locations are accessed more frequently over a period of time. This concept is pivotal in optimizing computer architecture and software design, as it leverages the predictable patterns of memory access to enhance overall system performance. Locality is primarily categorized into two types: temporal locality and spatial locality.

#### Temporal Locality

Temporal locality occurs when a program accesses the same memory location multiple times within a relatively short timeframe. This pattern suggests that once a piece of data is accessed, it is likely to be accessed again soon. Temporal locality is exploited by caching mechanisms, where recently accessed data is stored in a cache close to the processor. Since accessing data from the cache is significantly faster than accessing it from main memory, programs that exhibit strong temporal locality can achieve substantial performance improvements.

#### **Spatial Locality**

Spatial locality refers to the pattern where if a memory location is accessed, locations physically near it are likely to be accessed soon after. This tendency is based on the way programs are typically

structured and how data is often grouped and accessed in blocks or sequences. Exploiting spatial locality involves prefetching blocks of data into the cache before they are actually needed, thus reducing the number of slow main memory accesses. This is effective because modern processors can fetch and cache multiple adjacent memory locations with a single memory access operation, improving the efficiency of data retrieval.

## **Key Points of Locality**

- **Temporal Locality**: Important for the design of cache policies that determine which data to store and replace. It ensures that data which is likely to be reused in the near future remains accessible at high speeds.
- Spatial Locality: Influences the organization of memory and the prefetching strategies of processors. By anticipating the need for nearby data, systems can preload this information to avoid delays associated with accessing main memory.
- Impact on System Design: Both types of locality have profound implications for the architecture of computer systems, influencing the design of memory hierarchies, caching algorithms, and even the physical layout of memory.
- **Programming for Locality**: Programmers can significantly impact the performance of their applications by structuring data and code in ways that enhance locality. This includes practices like looping over data in contiguous blocks and organizing data structures to be locality-friendly.

# Example of Enhancing Locality

# Improving Spatial Locality in Matrix Multiplication

Consider the algorithmic task of multiplying two matrices. The traditional approach may not optimally exploit spatial locality, especially if one matrix is accessed row-wise and the other column-wise. This mismatch can lead to inefficient use of the cache, as the data needed for subsequent operations may not be physically adjacent in memory.

```
To enhance spatial locality, one can adopt a blocking technique, where the
matrices are divided into smaller submatrices or blocks. This reorganization allows
for the submatrices to be loaded into cache, ensuring that the subsequent accesses
to these blocks are cache hits, significantly improving performance.
```

This optimized approach not only reduces the number of cache misses but also aligns with the principles of spatial locality by ensuring that data accessed in succession is located close together in memory.

### Considerations

- The exploitation of locality is a key strategy in the design of efficient computing systems, from low-level hardware architecture to high-level software development.
- The balance between temporal and spatial locality optimizations must be considered, as focusing too much on one can sometimes diminish the other.
- Advanced compiler optimizations and programming techniques can further exploit locality, but understanding the underlying hardware's memory hierarchy is essential for maximizing these benefits.

The next section that is being covered from this chapter this week is **Section 6.3: The Memory Hierarchy**.

## Section 6.3: The Memory Hierarchy

## The Memory Hierarchy

The Memory Hierarchy is a structured framework of computer memory that consists of multiple layers, each with varying speeds, sizes, and costs. This hierarchy is designed to bridge the gap between the fastest and most costly types of memory (like CPU registers) and the slowest but most affordable ones (like hard disk drives). The core idea behind the memory hierarchy is to provide the illusion of a large, fast, and cheap memory system by strategically using different types of memory storage at different levels.

## Levels of the Memory Hierarchy

From fastest and most expensive to slowest and least expensive, the typical levels of the memory hierarchy include:

- Registers: Located inside the CPU, registers are the fastest form of memory for data storage and retrieval but have very limited storage capacity.
- Cache Memory: A small-sized but fast memory located close to the CPU cores. It is used to store copies of frequently accessed data from main memory, significantly reducing data access times. Cache memory is usually divided into three levels (L1, L2, and L3), with L1 being the fastest.
- Main Memory (RAM): A larger pool of memory that stores data and programs that are currently in use. It is slower and cheaper per bit than cache memory and registers but offers much greater capacity.
- Secondary Storage: Non-volatile storage devices like hard disk drives (HDDs), solid-state drives (SSDs), and optical media. These storage solutions provide substantial storage capacity at a lower cost but have much slower access times compared to RAM and cache.
- Tertiary Storage and Off-line Storage: Includes storage solutions like tape drives, used for backup and archival purposes. These storage media offer the lowest cost per bit and are used for data that is rarely accessed.

## Key Principles of the Memory Hierarchy

- Locality of Reference: Programs tend to access a relatively small portion of their address space at any given time. The memory hierarchy exploits this by keeping active data and instructions close to the processor.
- Cost-Performance Trade-off: By combining various types of storage media, the memory hierarchy aims to achieve a balance between cost, size, and speed, optimizing overall system performance.
- Automatic Management: Modern computer systems automatically manage data movement between the levels of the memory hierarchy. For example, data is promoted to higher levels (closer to the CPU) as it is accessed more frequently.

# Example of Memory Hierarchy Utilization

# Accessing Data in a Computing Task

When a processor needs to access data, it first checks the fastest level of memory, the registers. If the data is not found there (a condition known as a cache miss), it proceeds to check the L1 cache, then L2, and so on down the hierarchy until the data is found.

```
- If data is found in an upper level, it is used immediately (cache hit).
- If data must be retrieved from a lower level (e.g., main memory or secondary storage), it incurs a significant time penalty but is then stored in upper levels to speed up future accesses.
```

This process exemplifies how the memory hierarchy efficiently manages data access, reducing average access time by leveraging faster memory levels for more frequently accessed data.

#### Considerations

- The efficiency of the memory hierarchy is crucial for overall system performance. Optimizations at each level can lead to significant performance gains.
- The specific design and implementation of the memory hierarchy can vary significantly between different computing systems, influenced by their particular performance requirements and application domains.
- Software developers can influence the efficiency of memory usage through programming techniques that take advantage of the memory hierarchy, such as optimizing algorithms for cache efficiency.

The next section that is being covered from this chapter this week is **Section 6.4: Cache Memories**.

## Section 6.4: Cache Memories

#### Cache Memories

Cache memories are a critical component of the memory hierarchy, designed to bridge the significant speed gap between the central processing unit (CPU) and main memory (RAM). By storing frequently accessed data and instructions close to the CPU, caches reduce the average time to access memory, thereby enhancing overall system performance. Cache memories are smaller and faster than RAM but offer limited storage capacity due to their higher cost per bit.

## Structure and Types of Cache

Cache memories are typically structured in multiple levels (L1, L2, and L3):

- L1 Cache: The fastest and smallest cache directly integrated into the processor's core. It is split into separate sections for data and instructions (d-cache and i-cache) to optimize access.
- L2 Cache: Larger and slightly slower than L1, L2 cache serves as a bridge between the ultra-fast L1 cache and the larger but slower L3 cache. It can be exclusive or shared among cores, depending on the processor design.
- L3 Cache: The largest but slowest level of cache, often shared across all cores in a processor. L3 cache acts as a last resort before the CPU has to access the significantly slower main memory.

## **Cache Operation**

Caches operate on the principles of locality of reference, utilizing both spatial and temporal locality to predict and pre-load data that the CPU is likely to need soon. The basic operations of cache memory include:

- Reading Data: When the CPU requests data, the cache is the first place checked. If the data is present (a cache hit), it is delivered quickly to the CPU. If not (a cache miss), the data is fetched from a lower level of the memory hierarchy.
- Writing Data: Writing policies, such as write-through (data is written to both the cache and the main memory) and write-back (data is only written to the cache and later synchronized with the main memory), affect performance and data coherence.
- Cache Coherence: In multi-core systems, ensuring that all caches have the most recent data is crucial. Protocols like MESI (Modified, Exclusive, Shared, Invalid) help maintain consistency across caches.

## Cache Performance Optimization

Optimizing cache performance involves several strategies, including:

- Cache Size: Larger caches can store more data but have diminishing returns due to increased cost and complexity.
- Associativity: The method by which cache locations are selected for storing data. Higher associativity reduces cache misses but increases complexity and access time.
- Block Size: The unit of data exchange between cache and main memory. Optimal block size depends on the access pattern of applications.
- Replacement Policies: Algorithms to decide which cache entries to replace, like Least Recently Used (LRU), aim to minimize cache misses by keeping the most relevant data in cache.

## Example of Cache Utilization

### Web Browsing Cache Example

In web browsing, the browser cache stores copies of recently accessed web pages. When a user revisits a page, the browser first checks the cache:

```
- If the page is in the cache (hit), it loads almost instantly.
- If the page is not in the cache (miss), it is fetched from the internet, which is slower.
```

This caching mechanism significantly speeds up web browsing by reducing load times for frequently visited pages.

## Considerations

- The effectiveness of a cache memory system is highly dependent on the specific workloads and access patterns of the applications running on the system.
- Cache memories are a critical part of system design for both hardware architects and software developers, who must understand and optimize for cache behavior to achieve high performance.
- Emerging technologies and approaches to cache design continue to evolve, aiming to address the challenges of increasing processor speeds and the need for more efficient memory access patterns.

The next section that is being covered from this chapter this week is **Section 6.5: Writing Cache-Friendly Code**.

# Section 6.5: Writing Cache-Friendly Code

AAMYATA TO QAI YMAN

# Writing Cache-Friendly Code

Writing cache-friendly code involves understanding and leveraging the memory hierarchy, specifically the cache, to improve software performance. By aligning code and data structures with the principles of cache operation, programmers can significantly reduce cache misses, leading to faster execution times. Key strategies include optimizing for spatial and temporal locality, minimizing cache line evictions, and organizing data access patterns to align with cache design.

## **Key Strategies for Cache Optimization**

- Maximize Temporal Locality: Access data and instructions as close together in time as possible if they will be reused, to keep them in the cache.
- Enhance Spatial Locality: Organize data so that elements accessed closely together in time are also stored closely together in memory.
- Loop Interchange: Adjust the order of nested loops to access data in memory sequentially, reducing cache line misses.
- Block or Tile Loops: Break down large data sets into smaller chunks that fit into the cache, processing each chunk at a time to prevent cache evictions.
- Avoid Unnecessary Data Structure Padding: Align data structures to cache line boundaries when possible but avoid excessive padding that can waste cache space and lead to evictions.

## Example of Cache-Friendly Code

## Matrix Multiplication Optimization

A common example of optimizing for cache usage is the modification of a simple matrix multiplication algorithm. Consider a naive implementation:

This naive approach can lead to poor cache utilization due to the access patterns of B[k][j]. By transposing matrix B beforehand and adjusting the loop to multiply A[i][k] with B[j][k], spatial locality is improved:

```
void transpose_matrix(int N, double B[N][N], double BT[N][N]) {
         for (int i = 0; i < N; i++) {</pre>
              for (int j = 0; j < N; j++)</pre>
                  BT[j][i] = B[i][j];
6
    }
     void optimized_matrix_multiply(int N, double A[N][N], double BT[N][N], double C[N][N
9
         for (int i = 0; i < N; i++) {</pre>
              for (int j = 0; j < N; j++) {
   C[i][j] = 0.0;</pre>
                  for (int k = 0; k < N; k++) {
                       C[i][j] \leftarrow A[i][k] * BT[j][k]; // Notice B is transposed
              }
         }
    }
19
```

Transposing matrix B improves spatial locality because consecutive iterations of the innermost loop access consecutive elements of BT, making better use of cache lines loaded into the cache.

#### Considerations

- Writing cache-friendly code often requires profiling and benchmarking to identify bottlenecks and understand the cache behavior of specific hardware.
- The benefits of optimization can vary depending on the architecture, including the size of caches and their associativity.
- Balancing readability and maintainability of code with optimization efforts is crucial. Highly optimized code can sometimes be more difficult to understand and maintain.

The last section that is being covered from this chapter this week is **Section 6.5: Putting It Together: The Impact of Caches on Program Performance**.

## Section 6.5: Putting It Together: The Impact of Caches on Program Performance

## Putting It Together: The Impact of Caches on Program Performance

The impact of caches on program performance is profound and multifaceted, encompassing aspects of computer architecture, software design, and programming practices. Caches serve as a critical intermediary between the fast CPU and the slower main memory, significantly reducing the average memory access time by exploiting the principles of locality. Understanding and optimizing for cache behavior can lead to substantial performance gains in software applications.

## **Understanding Cache Impact**

The effectiveness of cache memory in enhancing program performance is primarily determined by several key factors:

- Cache Hit Rate: The percentage of memory accesses that are successfully served by the cache. A higher cache hit rate means fewer accesses to slower main memory, directly improving program performance.
- Cache Miss Penalty: The additional time required to fetch data from main memory or lower levels of the cache hierarchy on a cache miss. Reducing the miss penalty or avoiding misses altogether is crucial for maintaining high performance.
- Data and Instruction Locality: Programs exhibiting high spatial and temporal locality can make more efficient use of cache, as they access memory in patterns well-suited to caching mechanisms.
- Cache-Friendly Algorithms and Data Structures: Choosing or designing algorithms and data structures that align with cache operation principles can significantly impact performance.

## Case Study: Optimizing Cache Usage

To illustrate the impact of caches on program performance, consider a software application that processes large data sets:

# Performance Improvement Through Cache Optimization

Initial observations show the application spends a significant portion of its execution time on memory accesses. Profiling indicates a low cache hit rate, suggesting poor cache utilization. **Optimization Steps:** 

- 1. Analyze the application's memory access patterns to identify bottlenecks.
- 2. Reorganize data structures to improve spatial locality, ensuring related data is stored contiguously in memory.
- 3. Refactor critical loops and algorithms to enhance temporal locality, making frequent accesses to the same data.
- 4. Implement loop tiling or blocking to process data in chunks that fit within the cache, minimizing cache line evictions.

After optimization, the application demonstrates a significantly higher cache hit rate and reduced memory access times, leading to an overall performance boost.

### Considerations

• The impact of cache optimization varies across different hardware architectures, requiring targeted profiling and testing.

- While optimizing for cache performance, it's important to maintain code clarity and maintainability, balancing optimization efforts with software engineering best practices.
- Continuous performance monitoring and analysis are essential, as changes in software behavior or data usage patterns can alter cache efficiency.



# Week 9: (3/11 - 3/17)



# Exceptional Flow Control

## 9.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 8.1 Exceptions
- Computer Systems: Chapter 8.2 Processes
- Computer Systems: Chapter 8.3 System Call Error Handling
- Computer Systems: Chapter 8.4 Process Control

#### 9.0.2 Piazza

Piazza discussions are optional this week. (3/19/24)

#### 9.0.3 Lectures

The lecture videos for this week are:

- Exceptional Control Flow Exceptions  $\approx 15$  min.
- Exceptional Control Flow Processes  $\approx 9$  min.
- Exceptional Control Flow Process Control  $\approx 43$  min.
- Exceptional Control Flow Shells  $\approx 16$  min.

#### The lecture notes for this week are:

- Exceptional Control Flow Exceptions And Processes I Lecture Notes
- Exceptional Control Flow Exceptions And Processes II Lecture Notes
- Exceptional Control Flow Exceptions And Processes III Lecture Notes
- Exceptional Control Flow Signals And Nonlocal Jumps I Lecture Notes
- Exceptional Control Flow Signals And Nonlocal Jumps II Lecture Notes
- Exceptional Control Flow Signals And Nonlocal Jumps III Lecture Notes
- Exceptional Control Flow Signals And Nonlocal Jumps IV Lecture Notes

### 9.0.4 Assignments

The assignment for this week is:

- Performance Lab (4/2/24)
- Performance Lab Extra Credit (4/2/24)
- Performance Lab Interview (4/2/24)

## 9.0.5 Chapter Summary

The chapter that is being covered this week is **Chapter 8: Exceptional Flow Control**. The first section that is being covered from this chapter this week is **Section 8.1: Exceptions**.

# Section 8.1: Exceptions

## Understanding Exceptions in Exceptional Control Flow

Exceptions are events that disrupt the normal flow of control in a program. They can arise from hardware errors, such as division by zero and invalid memory access, or from software errors, like segmentation faults. Exceptional Control Flow (ECF) provides a mechanism for handling such exceptions, allowing a program to recover from unexpected states or terminate gracefully.

## Types of Exceptions

Exceptions can be broadly classified into four types based on their source:

- Interrupts: Asynchronous exceptions caused by external events (e.g., I/O operations completing).
- **Traps**: Synchronous exceptions that occur as a result of executing an instruction (e.g., system calls).
- **Faults**: Errors that can potentially be corrected, allowing the program to resume (e.g., page faults).
- **Aborts**: Serious errors indicating a problem from which the program cannot recover (e.g., hardware failures).

## **Handling Exceptions**

The operating system (OS) plays a crucial role in managing exceptions. When an exception occurs, the OS must:

- 1. Determine the type of exception and the context in which it occurred.
- 2. Save the current state of the CPU (e.g., program counter, registers).
- 3. Transfer control to an appropriate exception handler.
- 4. Depending on the handler's action, either return control to the point where the exception occurred, or terminate the process.

Exception handlers are specialized pieces of code designed to address specific types of exceptions.

They can be part of the OS kernel, part of libraries, or embedded within applications.

## Example of Software Exception Handling

## Division by Zero Error Handling

In many programming environments, attempting to divide by zero will raise an exception. A typical handler might:

```
if (denominator == 0) {
    throw DivisionByZeroException;
} else {
    result = numerator / denominator;
}
```

This pseudo-code demonstrates how an application can explicitly check for division by zero and throw an exception, which can then be caught and handled appropriately, either by terminating the process or by performing an alternative action.

## Considerations in Exception Handling

- Exception handling requires careful design to ensure that all potential exceptions are accounted for, and that the system can recover gracefully from them.
- The performance of a system can be affected by the overhead of checking for and handling exceptions. This needs to be balanced with the need for robustness and reliability.
- Portability concerns arise as different hardware platforms and operating systems might implement
  exception handling differently. It's important for software developers to be aware of these
  differences.

The next section that is being covered this week is **Section 8.2: Processes**.

#### Section 8.2: Processes

#### **Processes**

A process is a fundamental concept in operating systems, representing an instance of a running program. It encompasses the program's code (also known as text), its current activity represented by the program counter, contents of the processor's registers, and the process's address space, which includes the memory allocated to it. Processes provide the abstraction needed for the efficient and protected execution of multiple programs on a single computing system.

#### **Key Components of a Process**

- Process Identifier (PID): A unique number that identifies a process. It is used by the system to manage process resources and scheduling.
- **Text Section**: Contains the executable code of the program.
- Data Section: Includes global and static variables, allocated and managed by the compiler.
- Heap: Dynamically allocated memory during process run time, used for data that varies in size.
- Stack: Stores temporary data such as function parameters, return addresses, and local variables.
- Program Counter (PC): Indicates the next instruction to execute.

• Registers: Small, fast storage locations within the CPU that hold instructions, addresses, and data.

## Process Life Cycle

The life cycle of a process includes several states:

- New: The process is being created.
- Ready: The process is waiting to be assigned to a processor.
- Running: Instructions are being executed.
- Waiting: The process is waiting for some event to occur (such as an I/O completion or a signal).
- Terminated: The process has finished execution.

## Process Control Block (PCB)

Each process is represented in the operating system by a process control block (PCB), also known as a task control block. It contains important information about the process, including:

- Process state
- · Process privileges
- Process ID
- Program counter
- CPU registers
- CPU scheduling information
- Memory-management information
- Accounting information
- I/O status information

#### **Process Scheduling**

The operating system uses process scheduling to manage the execution of multiple processes, allocating CPU time in a way that maximizes the efficiency of its use. Common scheduling algorithms include Round-Robin, First-Come, First-Served, and Priority Scheduling.

#### Considerations

- Efficient process management is crucial for the overall performance and responsiveness of a system.
- Security and isolation between processes prevent them from interfering with each other or accessing unauthorized resources.
- The creation, scheduling, and management of processes require careful consideration to avoid issues such as deadlocks and resource starvation.

The next section that is being covered from this chapter this week is **Section 8.3: System Call Error Handling**.

## Section 8.3: System Call Error Handling

## System Call Error Handling

System call error handling is a crucial aspect of robust software development, ensuring that programs can deal with failures in system calls gracefully. System calls, which provide the interface between a running program and the operating system, can fail for various reasons, such as invalid arguments, resource exhaustion, or permissions issues. Proper error handling allows a program to check for potential failures, take corrective action, or fail gracefully, improving the program's reliability and user experience.

## Mechanisms for Error Reporting

In Unix-like operating systems, system calls report errors through a global variable errno. Upon a successful system call, errno is not modified by the system, and if an error occurs, the system call returns an error indicator (usually -1) and sets errno to a value that indicates the specific error.

- Checking Return Values: Programs must check the return value of system calls to determine if an error occurred. If the return value indicates an error (e.g., -1 for many system calls), the program should then check error to determine the cause of the failure.
- Interpreting errno: The errno variable is set to specific error codes, defined in <errno.h>, such as EACCES, ENOSPC, EINVAL, etc., indicating the reason for the error.
- Using perror and strerror: The perror() function prints a human-readable error message to standard error, based on the current value of errno. Alternatively, strerror() returns a pointer to the textual representation of the current errno value, allowing for error messages to be formatted or logged by the program.

## **Best Practices for Error Handling**

- Consistent Error Checking: Systematically check for errors after every system call that can fail. Neglecting these checks can lead to unpredictable program behavior.
- Meaningful Error Reporting: Use perror() or strerror() to provide clear, meaningful error messages to the user or log files, aiding in debugging and user support.
- Resource Management: Ensure that resources (e.g., file descriptors, memory) are properly released or cleaned up in error conditions to avoid resource leaks.
- Error Recovery: Where possible, implement strategies to recover from errors, such as retrying the system call, using alternative methods, or failing gracefully with a clear error message.

### Considerations

- System call error handling is platform-specific; the mechanisms described apply to Unix-like systems, and similar but different approaches are used on other operating systems.
- The granularity and specificity of error codes can vary, affecting how precisely a program can respond to different error conditions.
- Proper error handling can significantly increase the complexity of code. It's important to balance the thoroughness of error checks with code readability and maintainability.

The last section that is being covered from this chapter this week is **Section 8.4: Process Control**.

### Section 8.4: Process Control

#### **Process Control**

Process control in operating systems encompasses the mechanisms and strategies used to manage the lifecycle of processes. This includes process creation, execution, suspension, resumption, and termination. Effective process control is vital for the efficient allocation of system resources, ensuring system responsiveness and stability.

## **Key Aspects of Process Control**

- **Process Creation**: Processes can be created during system boot, by executing a system call, or by a running process (parent) that creates another process (child). The creation involves allocating a unique process identifier (PID), allocating memory, and initializing the process control block (PCB).
- **Process Execution**: The state of a process transitions from ready to running when the scheduler selects it. Execution involves the process performing its designated operations, such as computations and I/O.
- Process Suspension and Resumption: Processes may be suspended (moved from running to waiting state) due to I/O requests, interrupts, or higher priority processes becoming runnable. Suspended processes can be resumed (moved from waiting to ready state) once the reason for suspension is resolved.
- Process Termination: A process terminates when it finishes its execution or when it is explicitly killed due to an error or a kill command. Termination involves reclaiming any resources allocated to the process.

# Process Control Block (PCB)

The PCB is a data structure maintained by the operating system for every process. It contains important information necessary for process control:

- Process State: The current state of the process (e.g., running, waiting, ready).
- Process Privileges: Information regarding the process's allowed operations.
- Process ID: A unique identifier for the process.
- CPU Registers: The contents of the processor's registers for the process.
- CPU Scheduling Information: Priority, scheduling queue pointers, and other scheduling parameters
- Memory-Management Information: Page tables, segment tables, and limits.
- Accounting Information: Amount of processor time used, time limits, job or process numbers.
- I/O Status Information: List of I/O devices allocated to the process, a list of open files, etc.

#### Context Switching

Context switching is the process of saving the state of a currently running process so that it can be resumed at a later time and loading the state of another process to begin its execution. This is crucial for multitasking, allowing the CPU to switch between processes efficiently.

## Considerations

- Efficiency: Process control mechanisms must be designed to minimize overhead, especially for context switching, to maintain system performance.
- **Security and Isolation**: Proper management of process privileges and resources is necessary to prevent unauthorized access and ensure system stability.
- Scalability: The process control system must be capable of managing a large number of processes efficiently.



# Week 10: (3/18 - 3/24)



# Exceptional Flow Control

# 10.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 8.5 Signals
- Computer Systems: Chapter 8.6 Non-Local Jumps
- Computer Systems: Chapter 8.7 Tools For Manipulating Processes

# 10.0.2 Piazza

Piazza discussions are optional this week. (3/26/24)

## 10.0.3 Lectures

The lecture videos for this week are:

- Exceptional Control Flow Sending Signals  $\approx 13$  min.
- Exceptional Control Flow Receiving Signals  $\approx 29$  min.
- Exceptional Control Flow Non-Local Jumps  $\approx 10$  min.
- A Tale of Two Signals (Shell Lab)  $\approx 8$  min.

The lecture notes for this week are:

- Exceptional Control Flow Signals And Nonlocal Jumps IV Lecture Notes
- Virtual Memory Concepts I Lecture Notes
- Virtual Memory Concepts II Lecture Notes
- Virtual Memory Systems Lecture Notes

## 10.0.4 Assignments

The assignment for this week is:

- Performance Lab (4/2/24)
- Performance Lab Extra Credit (4/2/24)
- Performance Lab Interview (4/2/24)

## 10.0.5 Chapter Summary

The chapter that is being covered for this week is **Chapter 8: Exceptional Control Flow**. The first section that is being covered from this chapter this week is **Section 8.5: Signals**.

## Section 8.5: Signals

#### **Signals**

Signals are a form of software interrupt used to notify a process that an event has occurred. In UNIX and UNIX-like operating systems, signals are a primary method for exceptional control flow, allowing processes to be interrupted and to handle predefined or user-defined events asynchronously.

# **Key Aspects of Signals**

- Signal Generation: Signals can be generated by the operating system in response to hardware exceptions (e.g., division by zero, segmentation fault), by explicit user request via command-line utilities (e.g., 'kill' command), or by a process calling the 'kill' function to send a signal to another process.
- **Signal Delivery**: When a signal is generated, the operating system delivers it to the target process. If the process has registered a signal handler, that handler is executed; otherwise, the default action associated with the signal is performed.
- Signal Handling: Processes can choose to ignore signals, handle them using a custom signal handler, or accept the default behavior defined by the operating system. The default actions can include terminating the process, ignoring the signal, or suspending/resuming the process.
- Types of Signals: There are various signals defined in UNIX-like systems, such as SIGINT (interrupt from keyboard), SIGTERM (termination signal), SIGSEGV (segmentation violation), and SIGALRM (timer signal), each with its specific intended use.

#### Signal Handling Mechanisms

To handle signals, a process must specify how each signal is processed, using system calls to change the disposition of a signal. The 'signal' and 'sigaction' system calls are commonly used for this purpose:

- signal: Allows a process to specify the signal handling function for a particular signal.
- sigaction: Provides a more comprehensive and reliable mechanism than 'signal', allowing detailed control over signal handling, including blocking additional signals during the handling of a current signal.

#### Considerations in Signal Handling

- Atomicity: Signal handlers should execute quickly and avoid non-reentrant functions to prevent data races and inconsistencies.
- **Synchronization**: Careful synchronization is necessary when signals are used in concurrent programs, to avoid deadlocks and race conditions.
- **Portability**: Signal behavior can vary across different UNIX-like systems, so portable programs should rely on well-defined, consistent signal handling features.

The next section that is being covered from this chapter this week is **Section 8.6: Nonlocal Jumps**.

#### Section 8.6: Nonlocal Jumps

### **Nonlocal Jumps**

Nonlocal jumps are a mechanism in C that allows the control flow of a program to transfer directly from a point in the program to another without returning to the caller. This feature is implemented through the setjmp and longimp functions in the C standard library. Nonlocal jumps are particularly

useful in handling error conditions and implementing control structures that cannot be easily expressed with standard C constructs.

### **Key Aspects of Nonlocal Jumps**

- **setjmp**: This function initializes a jump buffer with the current execution context, including the program counter, stack pointer, and set of CPU registers, for later use by longjmp. It returns 0 when called directly and a nonzero value when returning from a call to longjmp.
- **longjmp**: This function transfers control back to the point where the corresponding setjmp was called, using the jump buffer to restore the execution context. The longjmp function does not return to its caller but causes setjmp to return again, this time with a nonzero value specified as the second argument to longjmp.
- Usage: Nonlocal jumps are often used in error handling where an error in a deeply nested function call needs to unwind the call stack quickly to a recovery point. They can also be used to implement coroutines or escape from complex control structures.
- Caution: While powerful, nonlocal jumps should be used sparingly due to their potential to disrupt normal control flow and resource management. They can complicate program understanding and maintenance, and they may skip the execution of critical cleanup code (e.g., resource deallocation).

## Considerations in Using Nonlocal Jumps

- Resource Leakage: Jumping out of a scope without executing the corresponding destructors or free operations can lead to resource leaks. Programmers need to ensure manual cleanup before performing a nonlocal jump.
- Variable Volatility: Variables that should retain their values across a longjmp should be declared as volatile to prevent undesired optimizations by the compiler.
- Signal Safety: Nonlocal jumps can be used in signal handlers to exit from a blocking system call or an infinite loop. However, their use should be carefully evaluated for signal safety and reentrancy.
- Compatibility and Portability: The behavior of setjmp and longjmp is well-defined in the C standard, but their interaction with complex program structures, third-party libraries, and system resources may vary across different environments and should be thoroughly tested.

The last section that is being covered from this chapter this week is **Section 8.7: Tools For Manipulating Processes**.

#### Section 8.7: Tools For Manipulating Processes

# **Tools For Manipulating Processes**

Tools for manipulating processes are essential for process management in operating systems. These tools allow users and system administrators to create, monitor, control, and terminate processes. Understanding and effectively using these tools is crucial for system maintenance and optimization.

### **Key Tools and Their Functions**

• ps (Process Status): Displays information about active processes. It can show the process ID, current state, memory usage, and command line that initiated the process, among other details. Users can tailor the output using various options.

- top: Provides a dynamic, real-time view of the system's running processes. It displays a list of processes that consume the most system resources at any given time, making it invaluable for monitoring system performance and identifying processes that may be affecting system efficiency.
- **kill**: Sends a signal to a process, typically to terminate the process. While commonly used for termination, the 'kill' command can send any signal to processes, allowing for versatile process control.
- **nice**: Changes the scheduling priority of a process, which can be used to influence the process's execution order. Processes with higher priority are allocated more CPU time than those with lower priority, affecting their performance.
- **nohup**: Allows a process to continue running in the background even after the user has logged out from the system. This is particularly useful for long-running processes in remote sessions.
- **jobs**: Lists all jobs that were started in the current shell session, showing their job number, state (running, stopped, etc.), and the command line that started them.
- **bg** (Background): Continues a stopped process by running it in the background. This is useful for multitasking and freeing up the terminal while a process runs.
- fg (Foreground): Moves a background process into the foreground, making it the current job and allowing the user to interact with it directly through the terminal.

# Considerations for Process Manipulation

- System Performance: Careful management of process priorities and resources is essential to maintain system performance. Overloading the system with high-priority processes can lead to resource contention and decreased system responsiveness.
- Security: Privileged processes and the ability to send signals to processes require appropriate permissions. Misuse can lead to security vulnerabilities, such as unauthorized access or denial of service.
- Stability: Terminating processes, especially system or critical processes, should be done with caution to avoid system instability or data loss.
- Compatibility: Some tools or their options might behave differently across Unix/Linux distributions. It's important to refer to the specific documentation for your operating system.

# Week 11: (4/1 - 4/7)



# Virtual Memory

# 11.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 9.1 Physical And Virtual Memory
- Computer Systems: Chapter 9.2 Address Spaces
- Computer Systems: Chapter 9.3 VM As A Tool For Caching
- Computer Systems: Chapter 9.4 VM As A Tool For Memory Management
- Computer Systems: Chapter 9.5 VM As A Tool For Memory Protection
- Computer Systems: Chapter 9.6 Address Translation
- Computer Systems: Chapter 9.7 Case Study: Core i7 / Linux

#### 11.0.2 Piazza

Piazza discussions are optional this week. (4/9/24)

#### 11.0.3 Lectures

The lecture videos for this week are:

- Virtual Memory Basic Concepts  $\approx 19$  min.
- Virtual Memory Address Translation  $\approx 16$  min.
- Virtual Memory End-to-End Address Translation  $\approx 12$  min.
- Virtual Memory The Core i7 / Linux Memory System  $\approx 16$  min.

#### The lecture notes for this week are:

- Virtual Memory Systems II Lecture Notes
- Virtual Memory Systems III Lecture Notes
- Dynamic Memory Allocation Basic Concepts I Lecture Notes
- Dynamic Memory Allocation Basic Concepts II Lecture Notes
- Dynamic Memory Allocation Advanced Concepts I Lecture Notes
- Dynamic Memory Allocation Advanced Concepts II Lecture Notes

#### 11.0.4 Assignments

The assignment for this week is:

- Performance Lab (4/2/24)
- Performance Lab Extra Credit (4/2/24)
- Performance Lab Interview (4/2/24)

#### 11.0.5 Quiz

The quizzes for this week are:

• Chapter 9a Quiz • Chapter 9a Finalized Quiz (4/9/24)

# 11.0.6 Chapter Summary

The chapter that is being covered this week is **Chapter 9: Virtual Memory**. The first section that is being covered from this chapter this week is **Section 9.1: Physical And Virtual Addressing**.

# Section 9.1: Physical And Virtual Addressing

# Physical and Virtual Addressing

Physical and virtual addressing are fundamental concepts in the architecture of modern computer systems, enabling efficient and flexible memory management. Understanding these concepts is crucial for grasping how operating systems manage memory resources and how programs interact with memory.

## Virtual Addressing:

Virtual memory is a memory management capability of an operating system (OS) that uses hardware and software to allow a computer to compensate for physical memory shortages, temporarily transferring data from random access memory (RAM) to disk storage. This process is transparent to the user and enables a computer to use more memory than is physically available in the system.

- **Abstraction:** Virtual addressing provides an abstraction layer that allows each process to operate as though it has its own, private memory space, isolated from other processes. This abstraction simplifies programming and increases security and stability by preventing processes from accessing each other's memories.
- Address Translation: The OS, with support from the hardware, translates virtual addresses to physical addresses. This translation is typically done via a page table, which contains mappings from virtual addresses to physical addresses.
- Benefits: The primary benefits of virtual addressing include improved security through isolation, easier memory management, and the ability to run programs larger than the available physical memory.

#### Physical Addressing:

Physical addressing refers to the actual addresses in a computer's memory hardware. When a program accesses memory, the virtual addresses specified by the program are translated into physical addresses by the memory management unit (MMU).

• **Direct Access:** In systems using physical addressing, programs directly access physical memory locations. This method is simple but lacks the flexibility and security benefits of virtual memory.

- Hardware-Dependent: Physical addresses are directly related to hardware. The layout of physical memory might include various regions dedicated to different uses (e.g., RAM, memory-mapped I/O).
- **Limitations:** The main limitations of physical addressing include a lack of process isolation, potential for memory conflicts among processes, and limited ability to manage memory efficiently.

#### **Interactions and Translation:**

The transition from virtual to physical addresses involves several key steps, primarily facilitated by the MMU. The process involves consulting a page table to find the frame number associated with a virtual page number, then combining this frame number with the offset to produce the physical address.

- Page Tables: Managed by the OS, these tables play a crucial role in address translation, mapping virtual pages to physical frames. Modern systems often use multi-level page tables to efficiently manage this mapping.
- TLB (Translation Lookaside Buffer): A cache that stores recent translations of virtual addresses to physical addresses to speed up address translation. The TLB can significantly reduce memory access times in systems with virtual memory.

#### Considerations:

- **Performance:** While virtual memory offers significant advantages, it can also introduce overhead due to the need for address translation and potential page faults, which occur when the data is not in physical memory and must be retrieved from disk.
- Security: Virtual memory can enhance security by isolating the address spaces of different processes, but it also requires careful management to prevent vulnerabilities like buffer overflow attacks.
- Hardware Support: Effective virtual memory management requires hardware support, including the MMU for address translation and mechanisms for efficient page table management.

The next section that is being covered from this chapter this week is **Section 9.2: Address Spaces**.

# Section 9.2: Address Spaces

#### Address Spaces

Address spaces are a key concept in computer architecture and operating systems, defining the range of discrete addresses that a process or system can use to access memory or other resources. Address spaces play a crucial role in memory management, providing a framework for isolating the memory used by different processes and for mapping virtual addresses to physical memory locations.

#### Types of Address Spaces

There are generally three types of address spaces in computing:

- Physical Address Space: This is the range of addresses that a computer's memory management unit (MMU) can physically address, corresponding to the actual physical memory (RAM) and memory-mapped peripherals. The size of the physical address space is determined by the hardware, particularly the CPU architecture (e.g., 32-bit vs. 64-bit).
- Virtual Address Space: Each process running on a system is given its own virtual address space, which is a range of addresses that the process can use. The operating system, with help

from the hardware, maps these virtual addresses to physical addresses. This allows for processes to be isolated from each other and for more efficient use of physical memory.

• Logical Address Space: Sometimes considered synonymous with virtual address space, logical address space can also refer to a view of memory that is abstracted from both physical and virtual memory, such as the way a program views memory through specific data structures or segmentation models.

### Significance of Address Spaces

- Memory Isolation: Address spaces allow different processes to run without interfering with each other's memory, enhancing system stability and security.
- **Memory Management:** They provide a mechanism for the operating system to efficiently allocate, track, and manage memory usage among multiple processes.
- Virtual Memory: The use of virtual address spaces enables systems to use disk storage as an extension of RAM, allowing for the execution of programs that require more memory than is physically available.

#### Implementation and Management

- Page Tables and Segmentation: Operating systems implement virtual address spaces using page tables or segmentation. Page tables map virtual addresses to physical addresses, while segmentation divides the address space into segments with different attributes.
- Memory Allocation: The OS allocates and manages memory within these address spaces, using strategies like paging and segmentation to efficiently use physical memory and to provide virtual memory functionalities.
- Hardware Support: Hardware features, such as the MMU, support the mapping of virtual to physical addresses. Additionally, modern CPUs offer features like Extended Page Tables (EPT) for virtualization, further extending the concept of address spaces.

# Considerations

- **Performance:** While address spaces and virtual memory provide numerous benefits, they can introduce performance overhead due to the need for address translation and the potential for page faults.
- Security: Proper management of address spaces is crucial for security. Vulnerabilities in how address spaces are handled can lead to security breaches, such as buffer overflows and privilege escalation attacks.
- Compatibility: Software developers must be aware of the address space limitations of the systems they are developing for, especially when dealing with 32-bit vs. 64-bit systems and the corresponding memory addressing limits.

Address spaces are a fundamental component of modern computing, allowing for complex memory management schemes and enhancing the security and reliability of computer systems.

The next section that is being covered from this chapter this week is **Section 9.3: VM As A Tool** For Caching.

## Section 9.3: VM As A Tool for Caching

# VM As A Tool For Caching

Virtual memory (VM) serves as an essential tool for caching, utilizing the hard drive as a cache for data stored in RAM. This technique allows systems to run applications that require more memory than is physically available, improving performance and memory utilization through sophisticated management strategies.

#### Understanding the Basics

At its core, virtual memory allows an operating system to use a portion of the hard disk as if it were additional RAM, creating a seamless and larger virtual address space. This mechanism involves the temporary storage (paging out) of data from RAM to disk storage and retrieving it back into RAM (paging in) when needed.

### Mechanisms of VM Caching

- Page Replacement Algorithms: VM systems use page replacement algorithms to decide which memory pages to store in RAM and which to offload to disk. Algorithms such as Least Recently Used (LRU), FIFO (First-In, First-Out), and others aim to minimize page faults and optimize memory access times.
- **Demand Paging:** Not all pages of a program need to be loaded into physical memory at once. Demand paging allows the system to load only those pages that are required, effectively using the hard disk as a cache for pages that are not currently in use.
- Write-back and Write-through Caching: These caching strategies are used to manage how data is written to storage. Write-back caching allows for faster memory operations by delaying writes to the disk, whereas write-through caching writes data to both the cache (RAM) and disk simultaneously, ensuring data integrity.

#### Benefits of VM Caching

- Efficiency: By only loading necessary pages into RAM and storing infrequently accessed data on disk, VM caching optimizes the use of physical memory and improves application performance.
- Cost-effectiveness: RAM is more expensive and limited compared to disk storage. VM caching leverages this cost differential by using cheaper disk space to effectively extend the memory available to applications.
- Flexibility: VM provides a flexible environment for managing applications' memory needs, allowing for dynamic allocation and deallocation of memory resources based on current system demands.

#### Considerations

- Page Faults and Performance: While VM caching can significantly enhance performance, page faults—instances where the data is not found in RAM and must be fetched from disk—can cause delays. Effective page replacement strategies are crucial for minimizing the impact of page faults.
- Disk I/O Overhead: Frequent paging between RAM and disk can lead to increased disk I/O, potentially becoming a bottleneck. Optimizing the paging process and minimizing unnecessary disk accesses are important for maintaining system performance.
- Configuration and Tuning: The effectiveness of VM as a caching tool depends on how well it is configured and tuned to the specific workload and hardware characteristics of a

system. Administrators and developers must carefully consider these aspects to achieve optimal performance.

VM caching is a powerful technique that exploits the hierarchical nature of storage devices to enhance memory management and system performance. By effectively using disk space as an extension of RAM, systems can handle larger and more complex workloads, providing a cost-effective and efficient solution for managing memory resources.

The next section that is being covered from the chapter this week is **Section 9.4: VM As A Tool** For Memory Management.

# Section 9.4: VM As A Tool For Memory Management

# VM as a Tool for Memory Management

Virtual memory (VM) is a critical technology in operating systems that allows for the effective management of a computer's memory resources. By abstracting the memory available to programs from the physical memory in the system, VM provides a flexible and efficient way to use both RAM and disk storage to run applications.

## Core Concepts of Virtual Memory in Memory Management

- **Abstraction:** VM abstracts the system's memory, providing each process with the illusion of having its own vast, contiguous memory space, regardless of the physical memory available. This abstraction simplifies programming and memory usage, allowing for more complex and memory-intensive applications.
- Process Isolation: By giving each process its own virtual address space, VM enhances system security and stability. Process isolation prevents one process from accessing or modifying the memory of another process, thereby reducing the risk of system crashes and security breaches.
- Memory Allocation: VM simplifies memory allocation, allowing for dynamic and flexible distribution of memory resources among running processes. It supports more efficient use of memory through techniques such as dynamic loading and lazy allocation.

#### Mechanisms of VM in Memory Management

- Paging: VM systems use paging to divide the virtual memory space into blocks of a fixed size, called pages. When a process needs to access memory, the required pages are loaded into physical memory. Pages not actively used can be swapped out to disk, allowing the system to manage more processes than would fit in physical RAM alone.
- Segmentation: Some VM systems use segmentation, dividing memory into segments based on logical divisions within programs, such as functions or data structures. Segmentation can be used alongside paging to further enhance memory management.
- Swapping: Swapping is the process of moving entire processes in and out of physical memory to disk. While less common in modern systems due to the overhead and latency involved, it is a crucial part of VM, allowing the system to free up physical memory by moving less active processes to disk storage.

#### Benefits of Using VM for Memory Management

• **Increased Multitasking:** VM allows for the execution of more processes simultaneously by efficiently managing the available physical memory and extending it with disk storage.

- Memory Protection: VM provides memory protection mechanisms, ensuring that one process cannot interfere with another's memory, thereby enhancing the overall stability and security of the system.
- Efficient Memory Usage: By dynamically allocating memory and only loading necessary parts of a program into physical memory, VM makes more efficient use of the system's memory resources.

# Considerations in VM Memory Management

- **Performance Overhead:** The benefits of VM come with the cost of additional overhead for tasks such as page mapping, swapping, and handling page faults, which can impact system performance.
- Storage Requirements: Using disk space as an extension of RAM requires sufficient disk space and can lead to increased wear on storage devices due to frequent read/write operations.
- Optimization: Effective memory management using VM requires careful configuration and optimization of the system's paging and swapping algorithms to balance performance with memory availability.

Virtual memory is an indispensable tool in modern computing for managing memory resources. Through the use of abstraction, process isolation, and flexible memory allocation techniques, VM enhances the capabilities of systems to run multiple, complex applications efficiently, despite the physical limitations of RAM.

The next section that is being covered from this chapter this week is Section 9.5: VM As A Tool For Memory Protection.

#### Section 9.5: VM As A Tool For Memory Protection

### VM as a Tool for Memory Protection

Virtual Memory (VM) is not only a mechanism for extending the apparent amount of physical memory but also plays a crucial role in protecting memory. It ensures that each process operates in its own isolated memory space, thereby preventing unauthorized access and modifications. This section explores how VM contributes to memory protection in modern computer systems.

#### Foundations of Memory Protection with VM

Memory protection is a safety mechanism that prevents a process from accessing memory that has not been allocated to it. This is essential for maintaining system stability and security. VM aids in this by providing several layers of abstraction and control:

- **Isolation:** Each process is given a separate virtual address space, which is mapped to the physical memory by the operating system and hardware. This separation ensures that processes cannot directly access each other's memory.
- Access Rights: VM systems can define access rights for different regions of memory. For example, certain areas can be marked as read-only or executable, preventing unauthorized writing or execution of code, respectively.
- Page Tables: The operating system uses page tables to manage the mapping of virtual memory to physical memory. These tables can also store access rights for each page, providing fine-grained control over memory access.

#### **Implementing Memory Protection**

- Hardware Support: Modern processors support memory protection at the hardware level through features like the Memory Management Unit (MMU), which facilitates the mapping of virtual addresses to physical addresses while enforcing access controls specified by the operating system.
- Software Mechanisms: The operating system implements memory protection policies using software mechanisms, such as creating and managing page tables and handling page faults when access violations occur.

## Benefits of VM in Memory Protection

- **Security:** By isolating the address space of each process, VM makes it much harder for malicious processes to affect the integrity of other processes or the operating system.
- Stability: VM prevents processes from accidentally overwriting each other's data, which enhances the overall stability of the system by reducing the chance of crashes caused by software errors.
- Control and Flexibility: VM allows the operating system to enforce different access rights (e.g., read, write, execute) on different pages of memory, providing a flexible and powerful mechanism for controlling how memory is used.

#### Challenges and Considerations

- **Performance Overhead:** Implementing memory protection using VM introduces some overhead, as each memory access requires translation and access checks. However, the benefits in terms of security and stability often outweigh these performance costs.
- Complexity: The mechanisms behind VM and memory protection add complexity to both hardware and software design, requiring careful planning and implementation to ensure they work effectively without compromising system performance.
- Configuration and Management: Proper configuration and management of VM and memory protection features are essential to balance security, stability, and performance, particularly in systems with diverse workloads and security requirements.

VM serves as a fundamental tool for memory protection in computer systems, leveraging hardware and software mechanisms to isolate processes, enforce access controls, and maintain system integrity. Despite the challenges, the role of VM in safeguarding memory is indispensable in modern computing environments.

The next section that will be covered from this chapter this week is **Section 9.6: Address Translation**.

### Section 9.6: Address Translation

#### Address Translation

Address translation is a core mechanism of virtual memory systems, enabling the conversion of virtual addresses to physical addresses. This process allows programs to use virtual addresses for memory operations, while the hardware ensures that these addresses are correctly mapped to the actual physical memory locations. Address translation is crucial for implementing virtual memory, providing memory protection, and supporting multitasking by ensuring that each process operates in

its own independent address space.

#### Mechanisms of Address Translation

The primary mechanism for address translation in modern computer systems involves a combination of hardware and software components, notably the Memory Management Unit (MMU) and the operating system's management of page tables.

- Memory Management Unit (MMU): A hardware component that is responsible for the real-time translation of virtual addresses to physical addresses. The MMU uses a structure called a page table, provided by the operating system, to find out how virtual addresses map to physical addresses.
- Page Tables: Data structures maintained by the operating system that contain mappings from virtual addresses to physical addresses. Each entry in a page table represents the mapping of a virtual page to a physical frame.

#### Process of Address Translation

The process of translating a virtual address to a physical address typically involves the following steps:

- 1. The processor generates a virtual address as part of an instruction execution.
- 2. The MMU takes the virtual address and divides it into two parts: the virtual page number (VPN) and the offset within that page.
- 3. Using the VPN, the MMU looks up the corresponding page table entry to find the physical frame number (PFN) associated with that VPN.
- 4. The physical address is then constructed by combining the PFN with the offset from the original virtual address.

# Example of Address Translation

Consider a simple system where the virtual memory space and the physical memory space are both 16KB, divided into 4KB pages/frames. This means there are 4 pages in both the virtual and physical space, and a page table can be represented with 4 entries.

- Assume a virtual address of 0x1234 is generated by the processor. Given a page size of 4KB (or 4096 bytes), the address can be divided into:
  - Virtual Page Number (VPN): 0x1 (the high-order bits)
  - Offset: 0x234 (the low-order bits)
- If the page table entry for VPN 0x1 maps to a Physical Frame Number (PFN) of 0x2, the physical address would be constructed by appending the offset to the PFN, resulting in a physical address of 0x2234.

## Considerations in Address Translation

- Translation Lookaside Buffer (TLB): To speed up the translation process, many systems use a TLB, which is a cache that stores recent mappings from the page table. This can significantly reduce the time required for address translation.
- Multi-Level Page Tables: For systems with large address spaces, page tables themselves can become very large. Multi-level page tables are used to efficiently manage these large mappings by breaking them down into more manageable pieces.

• **Performance:** While address translation provides many benefits, it can introduce latency due to the need to access the page table (and possibly the disk, in case of a page fault). Systems are designed to minimize this overhead through careful optimization of the page table structure and TLB.

Address translation is a fundamental aspect of virtual memory systems, enabling the seamless use of virtual addresses and supporting key features like process isolation and efficient memory utilization. The mechanisms and strategies employed to implement address translation are critical for the performance and reliability of modern computing systems.

The last section that is being covered from this chapter this week is **Section 9.7**: Case Study: The Intel Core i7/Linux Memory System.

# Section 9.7: Case Study: The Intel Core i7/Linux Memory System

## Case Study: The Intel Core i7/Linux Memory System

This case study examines the memory system of the Intel Core i7 processor when operating under the Linux operating system. The Intel Core i7 represents a significant advancement in processor technology, featuring a sophisticated memory hierarchy and advanced memory management capabilities. Coupled with the Linux operating system, which is known for its robust memory management, this combination offers insights into high-performance computing and efficient memory handling.

### Intel Core i7 Memory Features

The Intel Core i7 processor incorporates several key features designed to enhance its memory system's performance:

- Integrated Memory Controller (IMC): The Core i7 includes an IMC, which reduces memory latency by connecting the processor directly to the memory, bypassing the traditional Front-Side Bus (FSB) approach.
- Three-Level Cache Hierarchy: The processor features a three-level cache hierarchy (L1, L2, and L3) designed to improve data access times. The L1 and L2 caches are per core, while the L3 cache is shared across all cores, optimizing data sharing and reducing cache miss rates.
- QuickPath Interconnect (QPI): Replacing the older FSB, the QPI provides a high-speed interface between the processor cores, the IMC, and other components, enhancing data transfer rates and overall system performance.
- Turbo Boost Technology: This technology allows cores to dynamically adjust their clock speeds based on the workload and thermal conditions, potentially improving performance for memory-intensive applications.
- **Hyper-Threading:** By allowing each physical core to execute two threads simultaneously, hyper-threading effectively doubles the core count from the operating system's perspective, enhancing parallel processing capabilities, particularly in memory-bound applications.

#### Linux Memory Management

Linux's memory management system complements the hardware features of the Core i7, providing efficient handling of processes, virtual memory, and device I/O:

• Virtual Memory Management: Linux uses a combination of paging and segmentation for virtual memory management, with an advanced page replacement algorithm to optimize the use of the physical memory available.

- Transparent Huge Pages (THP): To reduce the overhead of managing large amounts of memory, Linux supports transparent huge pages that allow the operating system to allocate memory in larger chunks, reducing the number of page table entries needed and improving performance for certain workloads.
- NUMA (Non-Uniform Memory Access) Support: Given the Core i7's support for multi-core and multi-processor configurations, Linux's NUMA-aware memory management helps optimize memory usage across different cores and processors, improving efficiency and performance in large-scale systems.
- **Kernel Samepage Merging (KSM):** This feature allows Linux to merge identical memory pages across different processes into a single page, reducing the overall memory footprint of running applications.

#### Performance Considerations

The combination of the Intel Core i7's advanced memory features and Linux's robust memory management results in a highly efficient memory system capable of high performance and scalability. However, achieving optimal performance requires careful tuning of both hardware settings and the Linux kernel, including cache configurations, memory allocation policies, and scheduler settings, to match the workload characteristics.



# Week 12: (4/8 - 4/14)



# Virtual Memory

# 12.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 9.8 Memory Mapping
- Computer Systems: Chapter 9.9 Dynamic Memory Allocation
- Computer Systems: Chapter 9.10 Garbage Collection
- Computer Systems: Chapter 9.11 Common Memory Related Bugs In C Programs

#### 12.0.2 Piazza

Piazza discussions are optional this week. (4/16/24)

#### 12.0.3 Lectures

The lecture videos for this week are:

- Virtual Memory Memory Mapping  $\approx 10$  min.
- Virtual Memory Dynamic Memory Basic Concepts ≈ 14 min.
- Virtual Memory Dynamic Memory Implicit Allocator  $\approx 16$  min.
- Virtual Memory Dynamic Memory Explicit & Segregated Allocators  $\approx 10$  min.
- Virtual Memory Garbage Collection Memory Perils  $\approx 23$  min.
- ShellLab Orientation  $\approx 8$  min.

#### 12.0.4 Assignments

The assignment for this week is:

- Shell Lab (4/23/24)
- Shell Lab Interview (4/23/24)

## 12.0.5 Quiz

The quizzes for this week are:

• Chapter 9b Quiz • Chapter 9b Finalized Quiz (4/16/24)

#### 12.0.6 Exam

The exam for this week is:

- Practice Exam 3 (4/10/24)
- Exam 3 Exam 3 Making Programs Run Really Fast (4/10/24)

# 12.0.7 Chapter Summary

The chapter that is being covered this week is **Chapter 9: Virtual Memory**. The first section that is being covered from this chapter this week is **Section 9.8: Memory Mapping**.

## Section 9.8: Memory Mapping

# **Memory Mapping**

Memory mapping is a critical feature of virtual memory systems that allows for the direct mapping of files or devices into a process's address space, facilitating file I/O operations and inter-process communication. This mechanism enhances both the performance and the flexibility of memory management.

## File and Device Mapping:

Through memory mapping, files or device memory can be treated as if they were part of the process's own address space. This direct mapping allows processes to read from and write to files or devices using standard memory access methods rather than specific I/O system calls.

- Simplification of I/O Operations: Memory mapping simplifies file I/O operations by allowing programs to use pointer arithmetic and memory access operations, rather than read and write system calls, to interact with files.
- **Performance Improvement:** It can improve performance by taking advantage of the page caching provided by the virtual memory system. This means that frequently accessed file data can be kept in RAM, reducing disk access.
- Lazy Loading: Memory-mapped files support lazy loading, where parts of the file are only loaded into physical memory when they are accessed, which can reduce the amount of physical memory used and speed up the initial loading of the file.

#### Memory-Mapped I/O:

Memory-mapped I/O allows devices to be mapped into the virtual address space of a process. This mapping enables direct read and write access to device registers through standard memory access instructions, streamlining the interaction between software and hardware.

- **Direct Device Communication:** By mapping device registers into the virtual address space, programs can directly manipulate hardware devices, leading to more efficient device communication.
- Reduced Context Switching: This method reduces the need for system calls and context switching between user mode and kernel mode, enhancing system performance.
- Unified Memory Access: Memory-mapped I/O provides a unified mechanism for accessing both memory and I/O devices, simplifying programming and reducing the complexity of device drivers.

#### **Shared Memory:**

Memory mapping is also instrumental in implementing shared memory between processes. Shared memory regions allow multiple processes to access and modify data in a shared address space, facilitating efficient inter-process communication.

- Inter-Process Communication: Shared memory is a fast method of inter-process communication (IPC), as it avoids the overhead of message passing or signal handling.
- Synchronization: Access to shared memory must be carefully synchronized to prevent race conditions and ensure data consistency. Mechanisms like semaphores or mutexes are often used for this purpose.
- Efficiency: Shared memory is one of the most efficient forms of IPC because it does not involve copying data between the address spaces of processes, reducing the overhead on the system.

#### Considerations:

- **Security:** Proper management and protection of memory-mapped regions are crucial to prevent unauthorized access and ensure the integrity of the mapped files or devices.
- Resource Management: The OS must efficiently manage the virtual address space and physical memory resources to support memory mapping effectively, especially in systems with limited memory.
- **Portability:** The specifics of memory mapping can vary between operating systems, affecting the portability of applications that rely heavily on this feature.

The next section that is being covered from the chapter this week is **Section 9.9: Dynamic Memory Allocation**.

## Section 9.9: Dynamic Memory Allocation

#### Dynamic Memory Allocation

Dynamic Memory Allocation is an essential concept in computer programming that allows programs to obtain memory at runtime. This flexibility enables efficient use of memory for data structures whose size cannot be determined at compile time, supporting complex data manipulation and resource-intensive computations.

#### Concepts and Mechanisms:

Dynamic memory allocation involves the runtime allocation and deallocation of memory blocks according to a program's needs. This is in contrast to static memory allocation, where the memory size for variables is fixed and allocated at compile time.

- **Heap:** The heap is the memory area dedicated to dynamic memory allocation. Unlike the stack, the size of the heap adjusts dynamically as the program runs, accommodating the allocation and deallocation of memory blocks.
- Allocation and Deallocation: Functions like malloc(), calloc(), realloc(), and free() in C, or operators like new and delete in C++, are used to manage dynamic memory. These functions and operators allow programs to request memory, adjust the size of allocated memory, or release memory back to the system.
- **Fragmentation:** Dynamic memory allocation can lead to fragmentation, where free memory is divided into small blocks and is not contiguous. Fragmentation can be of two types: external fragmentation and internal fragmentation, each affecting memory utilization efficiency.

#### Memory Leaks:

A memory leak occurs when a program fails to release memory that is no longer needed. Memory leaks can lead to reduced performance or even cause a program to run out of memory, resulting in crashes or other undesired behavior.

• **Detection and Prevention:** Tools like Valgrind and AddressSanitizer can help detect memory leaks. Careful programming practices, such as always pairing an allocation with a corresponding deallocation, are essential to prevent memory leaks.

# Garbage Collection:

Some high-level programming languages, such as Java and Python, provide automatic memory management through garbage collection. Garbage collection automatically reclaims memory occupied by objects that are no longer in use by the program.

- Advantages: Reduces the burden on programmers to manage memory manually and helps prevent memory leaks and other memory management errors.
- **Trade-offs:** While it simplifies memory management, garbage collection introduces overhead and can lead to unpredictable pauses in program execution, affecting performance.

#### **Considerations:**

- Efficiency: Effective dynamic memory management is crucial for optimizing a program's memory usage and performance. This includes choosing the right allocation strategy and minimizing fragmentation.
- Security: Improper use of dynamic memory allocation can introduce vulnerabilities, such as buffer overflows and use-after-free errors, highlighting the need for careful memory handling.
- Portability: The behavior of dynamic memory allocation can vary across different environments and operating systems, necessitating portable and adaptive memory management strategies.

The next section that is being covered from this chapter this week is **Section 9.10:** Garbage Collection.

# Section 9.10: Garbage Collection

## Garbage Collection

Garbage Collection (GC) is an automatic memory management feature that recovers memory occupied by objects that are no longer in use by a program. It is a critical component of many modern programming languages, including Java, Python, and C, facilitating easier and safer memory management for developers.

#### Principles of Garbage Collection:

Garbage collection automates the task of memory deallocation that was previously manually managed by the programmer. This process involves identifying and freeing memory blocks that are no longer accessible or required by the program, ensuring efficient memory usage and preventing memory leaks.

• Reachability: An object is considered "reachable" if it can be accessed in any potential continuation of the program. Unreachable objects are eligible for garbage collection since they can no longer influence the program's execution.

- Mark and Sweep: One common algorithm used in garbage collection is the "mark and sweep" method. The "mark" phase identifies all reachable objects, while the "sweep" phase scans memory for unmarked objects, reclaiming their space.
- Generational Collection: Many garbage collectors use a generational approach, dividing objects into young and old generations based on their lifespan. Young generation objects are collected more frequently, optimizing GC performance since most objects tend to be short-lived.

### Benefits of Garbage Collection:

Garbage collection provides several benefits, including improved program stability and reduced developer overhead for memory management. However, it also introduces certain trade-offs.

- Reduced Memory Leaks: By automatically reclaiming unused memory, garbage collection significantly reduces the risk of memory leaks, which can lead to program instability and excessive resource consumption.
- Simplified Programming: Developers can focus more on application logic rather than intricate details of memory allocation and deallocation, leading to cleaner and more maintainable code.

# Challenges and Considerations:

Despite its advantages, garbage collection is not without its challenges, affecting application performance and behavior.

- Performance Overhead: The garbage collection process consumes computational resources, potentially affecting application performance. The unpredictability of GC execution can also lead to latency issues in time-sensitive applications.
- Memory Overhead: Efficient garbage collection algorithms often require additional memory for metadata and bookkeeping, increasing the overall memory footprint of applications.
- Tuning and Configuration: To mitigate performance impacts, many garbage collectors offer tuning options. Proper configuration is crucial for balancing application performance with memory management efficiency.

## **Advanced Techniques:**

- Real-time Garbage Collection: Designed for systems with strict latency requirements, real-time garbage collectors aim to minimize pause times by integrating the garbage collection process more closely with the application's execution.
- Concurrent and Parallel Garbage Collection: Modern garbage collectors often perform memory reclaiming concurrently with the application or use parallelism within the GC process itself to improve efficiency.

Garbage collection is a fundamental aspect of modern programming, offering significant benefits in memory management and program stability while presenting challenges that must be carefully managed.

The last section that is being covered in this chapter this week is **Section 9.11: Common Memory-Related Bugs in C Programs**.

#### Section 9.11: Common Memory-Related Bugs in C Programs

# Common Memory-Related Bugs in C Programs

Memory-related bugs are prevalent in C programs due to the language's low-level memory management capabilities, which give programmers direct control over memory allocation, deallocation, and access. While powerful, this control can lead to various bugs if not managed carefully. Understanding these bugs is crucial for writing robust, secure, and efficient C programs.

#### **Buffer Overflows:**

Buffer overflow occurs when a program writes data beyond the bounds of allocated memory. This can lead to unpredictable program behavior, including crashes and security vulnerabilities, such as the execution of malicious code.

- Cause: Typically caused by inadequate validation of input data sizes before writing to buffers.
- **Prevention:** Can be prevented by always checking the length of data before copying or moving it into fixed-size buffers.

#### Memory Leaks:

A memory leak happens when dynamically allocated memory is not freed after use, leading to a program consuming increasing amounts of memory over time. This can degrade system performance or cause the program to crash.

- Cause: Often occurs when pointers to allocated memory are overwritten or when the program logic bypasses the deallocation code.
- **Detection:** Tools like Valgrind can help detect memory leaks by monitoring memory allocation and deallocation at runtime.

#### Use-After-Free:

Use-after-free involves accessing memory after it has been deallocated, leading to undefined behavior, including program crashes, data corruption, or security exploits.

- Cause: Typically arises from poor management of pointer lifetimes and object ownership.
- Mitigation: Avoid by carefully tracking object lifecycles and ensuring pointers are set to NULL after free operations.

#### Double Free:

Double free errors occur when the same block of dynamically allocated memory is freed more than once, potentially corrupting the memory management data structures and leading to unpredictable behavior or security vulnerabilities.

- Cause: Usually results from logic errors where there is a loss of track of memory ownership and deallocation status.
- **Prevention:** Set pointers to NULL after freeing and before reusing them to help prevent accidental double frees.

#### Dangling Pointers:

Dangling pointers are pointers that do not point to a valid memory location. This can happen after the memory has been freed, yet the pointer is not updated, leading to potential security risks or program crashes if dereferenced.

- Cause: Often results from failing to set pointers to NULL after freeing the memory they point to.
- **Solution:** Nullify pointers immediately after freeing the associated memory to avoid dangling pointer issues.

#### **Considerations:**

- Best Practices: Adopting memory management best practices, such as consistent use of abstractions, careful tracking of ownership, and defensive programming, can mitigate many common memory-related bugs.
- Tools and Techniques: Utilize static analysis tools, dynamic analysis tools, and rigorous testing strategies to detect and prevent memory-related bugs in C programs.

Understanding and addressing common memory-related bugs are essential for developing secure, stable, and efficient C programs. Careful memory management, along with the use of appropriate tools and techniques, can significantly reduce the occurrence of these bugs.



# Week 13: (4/15 - 4/21)



# 13.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 7.1 Compiler Drives
- Computer Systems: Chapter 7.2 Static Linking
- Computer Systems: Chapter 7.3 Object Files
- Computer Systems: Chapter 7.4 Relocatable Object Files

#### 13.0.2 Piazza

Piazza discussions are optional this week. (4/23/24)

#### 13.0.3 Lectures

The lecture videos for this week are:

- Virtual Memory Safer Allocation In  $C++\approx 18$  min.
- Memory Exploits Meltdown And Spectre  $\approx 31$  min.
- Linking And Loading  $\approx 26$  min.

The lecture notes for this week are:

- Linking And Loading Linking Lecture Notes
- Linking And Loading Loading And Libraries Lecture Notes
- Linking And Loading Interposition Lecture Notes
- Security Compromise Via Speculation, Ccaches And Page Tables Lecture Notes

### 13.0.4 Assignments

The assignment for this week is:

- Shell Lab (4/23/24)
- Shell Lab Interview (4/23/24)

# 13.0.5 Chapter Summary

The chapter that is being covered this week is **Chapter 7: Linking**. The first section that is being covered from this chapter this week is **Section 7.1: Compile Drivers**.

## Section 7.1: Compile Drivers

#### Compile Drivers

Compile drivers, often referred to as "compilers" in a broader sense, are sophisticated tools that automate the sequence of compilation, assembly, and linking to transform source code into executable programs. They simplify the complex procedures involved in software development by providing a unified interface to manage these tasks.

#### Overview of Functionality:

- Compile drivers serve as the interface between the programmer and the lower-level software stack required to process and translate high-level source code into machine executable form.
- They streamline the development process by integrating the functionalities of preprocessors, compilers, assemblers, and linkers.
- Common examples include gcc, clang for C/C++ programming, and javac for Java, which are widely used in Unix/Linux and other development environments.

#### **Detailed Workflow:**

Compile drivers execute several critical steps to build executables:

- 1. **Preprocessing:** The preprocessor takes the initial source code files and processes directives such as #include (for including other source or header files) and #define (for defining macros). This step also involves removing comments and expanding macros.
- 2. **Compilation:** In this crucial phase, the preprocessed source code is compiled into assembly instructions tailored to the target processor's architecture. This involves syntax and semantic checks to ensure the code adheres to the language standards.
- 3. Assembly: Assembly language code generated by the compiler is then converted into machine code by an assembler. The output is typically in the form of object files, which contain machine code and metadata about the code like symbol tables and relocation information.
- 4. **Linking:** The linker takes multiple object files produced during assembly and combines them into a single executable program. It resolves symbols and addresses, managing external and internal dependencies efficiently.

#### Advanced Features and Usage:

- Beyond basic compilation and linking, compile drivers often offer extensive support for debugging, optimization flags, and the configuration of libraries or modules.
- They facilitate the maintenance of large codebases by automating the build sequences, ensuring consistent application builds across different development setups.
- Compile drivers also help in cross-platform software development by abstracting away the details of the target environment and handling platform-specific compilation tasks internally.

## Importance in Modern Development:

- In modern software engineering, compile drivers are indispensable for their role in automating and optimizing the development pipeline.
- They enhance developer productivity by reducing manual tasks and simplifying the integration of various code components and libraries.
- By handling various optimizations internally, compile drivers ensure that applications run efficiently on target hardware, which is crucial for performance-critical applications.

The next section that is being covered from this chapter this week is **Section 7.2: Static Linking**.

## Section 7.2: Static Linking

#### Static Linking

Static linking refers to the process of combining various pieces of program code and data into a single executable file at compile time. This linking method integrates all the required libraries and modules directly into the application's executable, which can be executed without further dependency resolution at runtime.

# Fundamental Principles:

- Static linking involves the inclusion of library routines and modules directly in the application's executable file, rather than relying on separate shared library files at runtime.
- It is performed by a linker program that searches and processes object files and libraries to create a single executable.
- One of the main advantages of static linking is that it creates self-contained executables that are portable and do not require external library dependencies on the target system.

#### Key Processes in Static Linking:

Static linking executes several systematic steps to produce a self-sufficient executable:

- 1. **Symbol Resolution:** The linker identifies and matches external symbols in the object files to their corresponding definitions in the library or other object files. It ensures that all referenced code and data are accounted for in the final executable.
- 2. **Relocation:** Adjusts code and data references in the object files so they point to the appropriate locations in the final executable. This step is necessary because the actual memory locations where the executable will be loaded cannot be known in advance.
- 3. **Space Allocation:** Allocates space for both code and data in the final executable, organizing these components as defined by the executable format (such as ELF in Unix/Linux).
- 4. **Output Generation:** Produces the final executable file, writing the linked code and data into a single file according to the executable format specifications.

#### Advantages and Disadvantages:

## • Advantages:

- Reliability: Since all necessary components are contained within the executable, the application is not susceptible to library version conflicts.
- Portability: The executable does not depend on external libraries being present on the system where it runs, enhancing its portability.
- Performance: Loading a statically linked executable can be faster than loading a dynamically linked one because there are no dynamic resolution delays.

### • Disadvantages:

- File Size: Statically linked executables are typically larger than dynamically linked ones, as they include all the code and data they need.

- Resource Use: Each statically linked executable duplicates code that could be shared between executables if dynamically linked, consuming more disk space and potentially more memory.
- *Updates and Maintenance:* Updating a statically linked library requires re-linking and redistributing the entire executable, which can complicate maintenance.

#### Common Use Cases:

- Static linking is often used in scenarios where high reliability and portability are critical, such as in embedded systems or when distributing software to environments where library availability cannot be guaranteed.
- It is also favored in situations where execution speed is a priority and the overhead of dynamic linking might be detrimental to performance.

The next section that is being covered from this chapter this week is **Section 7.3: Object Files**.

#### Section 7.3: Object Files

## Object Files

Object files are a crucial component in the software build process, acting as intermediaries between source code and the executable program. They contain machine code, data, and metadata created by the compiler during the compilation of source code. Object files are later used by linkers to create executable or shared library files.

#### Characteristics and Structure:

- Object files are produced by the compilation of source code files and contain a variety of information including compiled code (text segment), initialized data (data segment), uninitialized data (bss segment), symbols, and debugging information.
- They typically adhere to a standardized binary format, which varies by operating system and processor architecture, such as ELF (Executable and Linkable Format) on Unix-based systems, COFF (Common Object File Format) on Windows, and Mach-O on macOS.
- The structure of an object file includes headers that describe its size and layout, sections containing actual code and data, and a section header table that points to the different sections within the file.

#### **Key Components:**

Object files mainly consist of several distinct sections, each serving specific purposes:

- 1. **Header:** Contains metadata about the object file, including the format version, architecture, and size of sections.
- 2. **Text Section:** Includes executable instructions of the program. This is the code that is executed when the program runs.
- 3. Data Section: Contains initialized global and static variables.
- 4. **BSS Section:** Used for declaring variables that are not initialized by the programmer. At runtime, the operating system initializes them to zero.
- 5. **Relocation Information:** Necessary for linking, this part helps in modifying symbol references once their actual memory addresses are known.

- 6. **Symbol Table:** Lists functions and variables names used or defined in the object file, along with information about size, type, and location.
- 7. **Debugging Information:** Stores data that helps debuggers in mapping the executable code back to the source code locations.

# Practical Example

To illustrate, consider a simple C program:

```
1  #include <stdio.h>
2
3  int global_var = 5;
4
5  int main() {
6    printf("Hello, world!\n");
7    return 0;
8  }
9
```

Compiling this program (e.g., using 'gcc -c program.c') would produce an object file ('program.o'). This object file contains:

- A text section with the machine code for 'main' and 'printf'.
- A data section that includes the value of 'global\_var'.
- Relocation information that includes references to 'printf' which may be resolved during linking.
- Symbol table entries for 'main', 'global\_var', and 'printf'.

## Usage and Importance:

- Object files allow multiple source code files to be compiled separately and linked together later, supporting modular programming and improving compilation time.
- They enable the reuse of compiled code. A library's object files can be linked into different programs without needing to recompile the library.
- Object files are fundamental for static and dynamic linking processes, facilitating flexible software development and distribution models.

The last section that is being covered from this chapter this week is **Section 7.4: Relocatable Object Files**.

## Section 7.4: Relocatable Object Files

### Relocatable Object Files

Relocatable object files are a type of object file specifically designed to be used in multiple program environments by allowing the relocation of its code and data. They play a critical role in the linking phase of program compilation where different pieces of code are combined to form a single executable.

# **Core Concepts:**

• Relocatable object files are generated by compilers with the expectation that a linker will later adjust or "relocate" the addresses within them. This process enables the object files to function correctly wherever they are loaded into memory.

- They are essential for creating large programs where code and data are spread across multiple source files or when using external libraries that reside in separate files.
- The primary feature that distinguishes relocatable object files from other types is their use of relative addressing for data and function references, which requires relocation entries to be resolved by the linker.

#### **Structure and Elements:**

A relocatable object file typically includes several key elements that facilitate its integration and relocation:

- 1. **Header:** Specifies the file's architecture and provides meta-information about section sizes and the number of relocation entries.
- 2. **Text Section:** Contains the executable code with placeholders for addresses that need to be fixed at link time.
- 3. **Data Section:** Includes initialized global and static variables, again with placeholders for actual memory addresses.
- 4. **BSS Section:** Reserved for uninitialized data that does not occupy file space but requires space allocation during execution.
- 5. **Relocation Section:** Lists all the places in the text and data sections where addresses must be corrected; this includes both absolute and relative addresses.
- 6. **Symbol Table:** Contains names and other attributes of various identifiers found in the code, like variable and function names.

#### Example

Consider a C function that calculates the sum of two integers:

```
1  int add(int a, int b) {
2    return a + b;
3  }
4
```

When compiled into a relocatable object file (e.g., 'gcc -c add.c'), it would include:

- A text section with the machine code for the 'add' function.
- Relocation information that indicates where and how to adjust the machine code if the function's location in memory changes.
- A symbol table entry for 'add', marking it as an external symbol that may be referenced from other files.

#### Importance and Usage:

- Relocatable object files allow developers to compile source files independently of each other, thus facilitating incremental builds and the use of shared libraries.
- They enable the linker to optimize program layout in memory by arranging code and data segments efficiently, which can improve runtime performance and reduce resource usage.
- This type of object file is fundamental for supporting dynamic linking where executables and libraries are not fixed until program load time.

# Challenges:

- Managing and resolving relocation entries can be complex and time-consuming, especially for large applications with numerous external dependencies.
- Errors in relocation can lead to runtime errors that are difficult to debug, such as incorrect function calls or data access violations.



# Week 14: (4/22 - 4/28)



# 14.0.1 Assigned Reading

The reading assignment for this week is:

- Computer Systems: Chapter 7.5 Symbols And Symbol Tables
- Computer Systems: Chapter 7.6 Symbol Resolution
- Computer Systems: Chapter 7.7 Relocation
- Computer Systems: Chapter 7.8 Executable Object Files
- Computer Systems: Chapter 7.9 Loading Executable Object Files
- Computer Systems: Chapter 7.10 Dynamic Linking With Shared Libraries
- Computer Systems: Chapter 7.11 Linking And Loading Shared Libraries
- Computer Systems: Chapter 7.12 Position Independent Code (PIC)
- Computer Systems: Chapter 7.13 Library Interpositioning

#### 14.0.2 Piazza

Piazza discussions are optional this week. (4/30/24)

#### 14.0.3 Lectures

The lecture videos for this week are:

- Linking And Loading Loading  $\approx 14$  min.
- Linking And Loading Library Interposition  $\approx 12$  min.
- Malloc Lab Orientation  $\approx 14$  min.

## 14.0.4 Assignments

The assignment for this week is:

- Malloc Lab (5/3/24)
- Malloc Lab Extra Credit (5/3/24)

#### 14.0.5 Quiz

The quizzes for this week are:

• Chapter 7 Quiz • Chapter 7 Finalized Quiz (4/30/24)

#### 14.0.6 Chapter Summary

The chapter that is being covered this week is **Chapter 7: Linking**. The first section that is being covered from this chapter this week is **Section 7.5: Symbols And Symbol Tables**.

# Section 7.5: Symbols And Symbol Tables

### Symbols and Symbol Tables

Symbols and symbol tables are integral components of the compilation and linking processes in software development. Symbols represent various entities in code like variables, functions, and constants, and are crucial for identification and operation execution in a program. Symbol tables, on the other hand, record and organize information about these symbols, including their types, scopes, and linkage properties.

#### **Understanding Symbols:**

- In programming, symbols are the names assigned to computational entities such as functions, variables, and constants that need to be uniquely identified across the source code and during compilation.
- Symbols come with associated attributes which include type (e.g., integer, function), scope (e.g., local, global), and sometimes the address in memory where they are stored.
- The compiler uses these symbols to verify consistency in variable and function usage across the program and to help the linker in resolving references during the final assembly of the program.

#### Role of Symbol Tables:

Symbol tables are data structures that store details about the symbols used in a program:

- 1. **Structure:** They are often implemented as hash tables or dictionaries where each entry links a symbol and its detailed attributes.
- 2. **Contents:** Information in a symbol table entry typically includes the symbol's name, type, scope, memory address, and additional metadata such as the module where it is defined.
- 3. **Functionality:** During compilation, the symbol table helps in name resolution; during linking, it aids in binding references to their definitions across object files.

#### Practical Example

To illustrate, consider a simple C code snippet:

```
int global_var = 10;

int add(int a, int b) {
   return a + b;
}
```

When compiled into a relocatable object file (e.g., using 'gcc -c example.c'), the symbol table entries might include:

- An entry for 'global\_var' with attributes indicating it is a global integer variable, located at a certain memory address.
- An entry for 'add' showing it is a function taking two integers and returning an integer, along with its code's address in the text segment and scope as global.

#### Importance and Benefits:

- Error Detection: Symbol tables are crucial for compilers to detect errors like duplicate symbols or undefined references, ensuring code reliability.
- Code Optimization: They allow compilers to optimize code by understanding usage patterns and scopes of variables and functions.
- Efficient Linking: Symbol tables facilitate efficient linking by providing necessary information to resolve external and internal links during program assembly.

## Challenges in Management:

- The complexity of managing large symbol tables effectively, especially in projects with extensive codebases, requires efficient data structures and algorithms for quick lookup and updates.
- Accuracy and consistency across compilation and linking phases must be meticulously maintained to prevent runtime errors and ensure stable program execution.

The next section that is being covered from this chapter this week is **Section 7.6: Symbol Resolution**.

# Section 7.6: Symbol Resolution

## Symbol Resolution

Symbol resolution is a critical phase in the compilation and linking process where the associations between symbol references and their definitions are established. This step is essential for transforming multiple pieces of object code into a coherent executable program by ensuring that all references point to the correct locations.

#### **Process Overview:**

- Symbol resolution occurs during the linking phase where the linker looks for the definitions corresponding to all symbol references in the object files and libraries included in a program.
- The goal is to match each symbol used in the program (e.g., function and variable names) with its corresponding definition found in the same or another object file.
- This process involves a detailed lookup in the symbol tables embedded within each object file or library to find where each symbol is defined.

#### Steps Involved in Symbol Resolution:

The linker performs several key steps to achieve symbol resolution:

- 1. **Identifying Symbols:** It starts by collecting all the symbols from the symbol tables of the various object files.
- 2. **Resolving Multiple Definitions:** If multiple definitions of a symbol are found, the linker must decide which one to use based on the scope (local vs. global) and other attributes.
- 3. **Handling Undefined Symbols:** The linker also checks for symbols that are referenced but not defined in any of the linked files. This situation must be resolved either by linking with additional libraries or by notifying an error.
- 4. **Assigning Addresses:** Once all symbols have been resolved, the linker assigns final memory addresses to the symbols, updating all references within the code to these addresses.

# Practical Example

Consider two C source files, main.c and helper.c, with the following contents:

```
1    /* main.c */
2    #include <stdio.h>
3    void print_hello();
4
5    int main() {
6        print_hello();
7        return 0;
8    }
9
```

```
1  /* helper.c */
2  #include <stdio.h>
3
4  void print_hello() {
5    printf("Hello, world!\n");
6  }
7
```

Compiling these files separately generates two object files, main.o and helper.o. During the linking phase:

- The linker notices that main.o references the symbol print\_hello which is undefined within it.
- It then finds the definition of print\_hello in helper.o.
- Symbol resolution involves updating references in main.o to point to the correct location of print\_hello in helper.o.

# Challenges and Considerations:

- Complexity: The symbol resolution process can become complex in large systems with many interdependent modules and libraries.
- Errors: Unresolved symbols are a common source of errors during linking, often due to missing files or libraries, which require careful management and accurate dependency specification.
- **Performance:** Efficient symbol resolution is critical for the performance of the linking process, especially in environments where build time is a constraint.

The next section that is being covered from this chapter this week is **Section 7.7: Relocation**.

#### Section 7.7: Relocation

#### Relocation

Relocation is a fundamental process in the linking and loading phases of program compilation, where symbolic references in the code are adjusted so that they point to the correct execution addresses. This step is crucial for the flexible execution of programs across different memory environments.

#### Overview of Relocation:

- Relocation involves modifying the code and data in a program to correspond to the actual memory locations allocated to it by the system at load time or link time.
- This process ensures that the program can function correctly regardless of where it is loaded into memory, supporting the use of shared libraries and dynamic loading.

• Relocation is performed by the linker during the linking process and/or by the operating system at runtime when using dynamic linking.

## Types of Relocation:

Relocation operations can be classified into several types based on when and how they are performed:

- 1. **Static Relocation:** Done at compile-time by the linker, static relocation fixes all symbolic references before the program is run. This is typical for statically linked executables.
- 2. **Dynamic Relocation:** Occurs at runtime and is handled by the dynamic linker (part of the operating system). This type is used for shared libraries and dynamic executables where the actual memory locations cannot be determined in advance.

## **Relocation Process Steps:**

The relocation process generally includes the following key steps:

- 1. Collecting Relocation Entries: The linker or loader collects all the relocation entries from the object files or executables. These entries indicate which parts of the code need to be modified.
- 2. **Symbol Resolution:** It resolves all symbolic references to their actual memory addresses provided either by the linker's symbol table or runtime symbol information.
- 3. Address Modification: Based on the resolved addresses, the code and data sections of the program are updated so that all internal and external references point to the correct locations.
- 4. Adjustment of Program Headers: If necessary, program headers and other metadata are adjusted to reflect the new location of code and data segments.

### Practical Example

Consider a scenario where two object files, file1.o and file2.o, are linked together. Suppose file1.o contains a function call to a function defined in file2.o:

```
1  /* file1.c */
2  extern int func_in_file2();
3  int main() {
4    return func_in_file2();
5  }
6
7  /* file2.c */
8  int func_in_file2() {
9    return 42;
10  }
11
```

During linking, the linker identifies that func\_in\_file2() is defined in file2.o and updates the call in file1.o to point to the correct address in the combined executable.

 Relocation entries for func\_in\_file2() in file1.o are processed to reflect its new memory address as determined post-linking.

#### Challenges in Relocation:

- Complexity: Handling the various types of relocation and ensuring that all references are correctly updated is complex, particularly in systems with extensive code bases and multiple dependencies.
- **Performance:** Relocation, especially dynamic relocation, can impact the startup time of programs as adjustments need to be made at runtime.

• Security: Incorrectly handled relocations can lead to vulnerabilities (like buffer overflows or execution of malicious code), necessitating careful management and security checks during the relocation process.

The next section that is being covered from this chapter this week is **Section 7.8: Executable Object Files**.

#### Section 7.8: Executable Object Files

## Executable Object Files

Executable object files, also known simply as executables, are files that are directly executable by the computer's CPU. They are the final output of the compilation and linking processes and contain binary code that can be directly loaded into memory and run by the operating system.

#### Characteristics of Executables:

- Executable files are self-contained, containing all the necessary code and data (including the operating system-specific metadata) required to execute a program.
- They are distinct from source code and other types of object files in that they have been fully linked, meaning all external references have been resolved and set to their respective memory addresses.
- The format of executable files can vary depending on the operating system and architecture; common formats include Portable Executable (PE) for Windows, Executable and Linkable Format (ELF) for Unix/Linux, and Mach-O for macOS.

## Composition of Executable Object Files:

Executable files generally consist of several key sections, each serving a specific purpose in the program's execution:

- 1. **Header:** Contains metadata about the file, such as its type, architecture, and sizes of different sections.
- 2. **Text Section:** The actual machine code that is executed by the CPU. This section is what typically defines the executable as being 'executable'.
- 3. **Data Section:** Includes global and static variables that are pre-initialized in the code.
- 4. **BSS Section:** Stands for "Block Started by Symbol." It is used for declaring variables that are not initialized by the programmer; they are automatically initialized to zero by the operating system when the program runs.
- 5. **Resource Section:** (Optional, typically found in Windows executables) Contains icons, menus, and other GUI elements as well as other non-code resources used by the executable.
- 6. **Relocation Table:** Provides information on how to adjust the addresses within the executable when it is loaded into memory.
- 7. **Symbol Table and Debugging Information:** Includes information used for debugging purposes, such as the names and line numbers of the source code files that contributed to the executable code.

# Practical Example

Imagine a simple program written in C that prints "Hello, World!" to the console:

```
#include <stdio.h>

int main() {
    printf("Hello, World!\n");
    return 0;
}
```

Compiling and linking this program using a command like 'gcc -o hello.exe hello.c' will produce an executable file named 'hello.exe', This executable will include:

- A text section containing the compiled machine code for the 'main' function and the 'printf' call.
- A data section possibly containing any global variables (if defined).
- A BSS section (if there are uninitialized global variables).
- A relocation table that indicates how the machine code might be adjusted when the program is loaded into memory.

#### Significance and Usage:

- Executable object files are crucial for the distribution of software. They allow programs to be packaged in a form that is immediately runnable on target systems without the need for further compilation or linking.
- They facilitate the implementation of complex programs by allowing various modules to be compiled separately and linked into a single, runnable file.

## Challenges in Managing Executable Files:

- Ensuring compatibility across different systems and architectures can be challenging due to differences in executable formats and system expectations.
- Security is a significant concern as executables can contain malicious code if not properly verified.
- Performance optimization is critical, as inefficiently compiled or linked executables can lead to slow program execution and poor resource management.

The next section that is being covered from this chapter this week is **Section 7.9: Loading Executable Object Files**.

#### Section 7.9: Loading Executable Object Files

#### Loading Executable Object Files

Loading executable object files into memory is a critical step in the execution process of any program. This process involves reading the executable file from disk, interpreting its contents, and placing them into memory so that the operating system can begin execution.

# Overview of the Loading Process:

• The loading of executable files is managed by the operating system's loader component, which reads the executable file's structure (headers and sections) to determine how to properly allocate

memory and prepare the program for execution.

- This process ensures that the executable is brought into the system's memory in a manner that respects the needs and constraints defined in the executable file format, such as segment sizes and permissions.
- Common executable formats include ELF (Executable and Linkable Format) for Unix/Linux systems, PE (Portable Executable) for Windows, and Mach-O for macOS, each requiring different handling by the respective loaders.

#### Key Steps in Loading an Executable:

The typical steps involved in loading an executable object file into memory include:

- 1. **Reading and Parsing Headers:** The loader first reads the headers of the executable, which contain metadata about the file format, the required memory layout, and execution start points.
- 2. **Allocating Memory:** Based on information from the headers, memory is allocated for the executable's code (text section), data (data section), and uninitialized data (BSS section).
- 3. Copying Sections: The text and data sections from the executable file are copied into the allocated memory. The BSS section is also initialized to zero.
- 4. **Relocation:** If the executable isn't position-independent, the loader adjusts internal addresses within the text and data sections according to the actual memory addresses allocated (relocation process).
- 5. **Setting Execution Permissions:** The loader sets appropriate permissions (read, write, execute) on the memory segments to protect the integrity of the executable.
- 6. **Jumping to Entry Point:** Finally, the loader transfers control to the executable's entry point (usually the 'main' function in C and C++ programs), starting program execution.

#### Practical Example

Consider the executable file 'hello.exe' generated from a simple C program:

```
#include <stdio.h>

int main() {
    printf("Hello, World!\n");
    return 0;
}
```

Upon executing 'hello.exe', the operating system's loader performs the following tasks:

- Reads the ELF header to determine the size and location of the text, data, and BSS sections.
- Allocates memory for these sections and copies the text and data sections from disk to memory. Initializes the BSS section to zero.
- Relocates any absolute addresses based on where sections were loaded in memory.
- Sets memory permissions to prevent execution of data sections and modification of code sections.
- Transfers control to the start of the text section, effectively beginning the execution of the program.

# Importance and Challenges:

- **Importance:** Proper loading is essential for the correct and secure functioning of executable files, ensuring that all program components are correctly aligned and executable.
- Challenges: The loading process must handle various file formats and systems architectures efficiently. Additionally, it must ensure security against common exploits such as buffer overflows and code injection by setting appropriate memory permissions.

The next section that is being covered from this chapter this week is **Section 7.10: Dynamic Linking With Shared Libraries**.

# Section 7.10: Dynamic Linking With Shared Libraries

## Dynamic Linking With Shared Libraries

Dynamic linking with shared libraries is a method by which a program loads and links its dependencies (libraries) at runtime rather than at compile-time. This approach allows multiple programs to share the same library code in memory, reducing resource consumption and improving system efficiency.

### Concept of Dynamic Linking:

- Unlike static linking, where library code is copied into each executable, dynamic linking defers the resolution of functions and variables until a program is run. This means that only one copy of the library needs to be in memory, which can be used by all programs needing access to the library.
- Dynamic linking involves using a dynamic linker (sometimes called a dynamic loading routine) that loads the necessary libraries into memory at runtime and adjusts the program's function calls and variable accesses to point to the correct locations in these libraries.
- This technique is essential for using dynamically loaded libraries (DLLs on Windows or shared objects in Unix/Linux).

#### Key Processes in Dynamic Linking:

Dynamic linking typically involves several important steps:

- 1. **Identifying Library Dependencies:** The dynamic linker first identifies the shared libraries that the executable depends on using information stored in its headers.
- 2. **Loading Libraries:** The required libraries are loaded into memory if they are not already present.
- 3. **Symbol Resolution:** Function and variable names in the executable are linked to their actual data or function definitions in the library.
- 4. **Relocation:** Addresses within the loaded libraries are adjusted so that they function correctly at their new memory locations.
- 5. **Initialization:** Library initialization routines are run to prepare the library for execution in the context of the current program.

# Practical Example

Consider a program that uses the standard C library to output text to the console:

```
#include <stdio.h>

int main() {
    printf("Hello, World!\n");
    return 0;
}
```

During execution, the operating system performs the following tasks using dynamic linking:

- Detects that 'printf' belongs to the C standard library ('libc.so' on Unix/Linux or 'msvcrt.dll' on Windows).
- Loads the 'libc.so' or 'msvcrt.dll' into memory, if it is not already loaded by another program.
- Connects the call to 'printf' in the program with the actual function in the library.
- The address of 'printf' is adjusted to point to its memory location within the dynamically loaded library.

## Benefits and Challenges:

#### • Benefits:

- Memory Efficiency: Saves memory by sharing library code among multiple programs.
- Ease of Updates: Updating a library does not require re-linking the programs that use it.
- Flexibility: Programs can use additional functionalities from new library versions without changing the executable.

#### • Challenges:

- Dependency Management: Requires careful management of library versions to avoid "DLL Hell" where incompatible library versions interfere with each other.
- Performance Overhead: Dynamic linking can introduce a runtime performance overhead due to the extra work needed to resolve symbols and load libraries.
- Security Risks: Increased complexity of managing dynamically linked libraries can lead to security vulnerabilities if not handled correctly.

The next section that is being covered from this chapter this week is **Section 7.11: Loading And Linking Shared Libraries From Applications**.

#### Section 7.11: Loading And Linking Shared Libraries From Applications

# Loading And Linking Shared Libraries From Applications

Loading and linking shared libraries from applications is a dynamic process that occurs at runtime, allowing applications to use various functionalities housed within external shared libraries (such as DLLs in Windows or SO files in Unix/Linux). This approach optimizes memory usage and enhances

application flexibility by loading only the necessary libraries when they are required.

## Understanding the Process:

- This process is managed by the operating system's dynamic linker/loader, which handles the tasks of locating the appropriate shared library files, mapping them into the application's address space, and resolving symbol references to these libraries.
- Dynamic linking with shared libraries enables applications to call functions and access data stored in these external files as if they were part of the application itself.
- The use of shared libraries reduces the overall memory footprint of applications and allows for the easy update of library code without requiring changes to the dependent applications.

#### Key Steps Involved:

The process typically involves several steps to integrate shared libraries with applications:

- 1. **Library Search:** The dynamic linker uses a set of predefined rules and environment variables (like 'LD\_LIBRARY\_PATH' in Unix) to locate the required shared library files.
- 2. **Library Loading:** Once found, the shared library files are loaded into memory. This is typically done lazily (on-demand), meaning that the loading occurs the first time a function from the library is called.
- 3. **Symbol Resolution:** The linker then performs symbol resolution where it matches function calls and variable references in the application to the actual entries within the loaded shared libraries.
- 4. **Relocation:** Necessary adjustments are made to the function and variable addresses in the application to point to the correct locations within the shared library.
- 5. **Initialization:** Any initialization routines in the shared libraries are executed to prepare the library for operation within the context of the application.

# Practical Example

Imagine an application that requires graphical capabilities provided by a library like OpenGL:

```
#include <GL/gl.h>

void renderScene() {
    glClear(GL_COLOR_BUFFER_BIT);
    glBegin(GL_TRIANGLES);
        glVertex3f(-0.5, -0.5, 0.0);
        glVertex3f(0.5, 0.0, 0.0);
        glVertex3f(0.0, 0.5, 0.0);

glEnd();
    glFlush();
}

int main() {
    renderScene();
    return 0;
}
```

During the execution of this program, the operating system performs the following actions:

- Identifies the need for the OpenGL library ('libGL.so' on Unix/Linux or 'opengl32.dll' on Windows).
- Loads the OpenGL library into memory if it's not already loaded.
- Links function calls like 'glClear' and 'glBegin' to their definitions in the OpenGL library.
- Handles any necessary address adjustments (relocation) to ensure the calls function properly.

#### Benefits and Potential Issues:

#### • Benefits:

- Modularity: Allows applications to be more modular and adaptable to different environments since they can share common library code.
- Efficiency: Reduces the memory and storage footprint of applications by sharing library code among multiple programs.
- *Upgradability:* Simplifies updates to software components by allowing individual libraries to be updated without recompiling dependent applications.

#### • Challenges:

- Complex Dependency Management: Requires careful management of library versions to prevent conflicts (commonly referred to as "DLL Hell").
- Runtime Overhead: The dynamic nature of the process can introduce delays and performance overhead due to on-the-fly symbol resolution and loading.
- Security Risks: Increases the vulnerability to security risks if not properly managed, as malicious code can potentially be injected through dynamic libraries.

The next section that is being covered from this chapter this week is **Section 7.12: Position-Independent Code (PIC)**.

# Section 7.12: Position-Independent Code (PIC)

# Position-Independent Code (PIC)

Position-Independent Code (PIC) is a type of code executable that can run correctly regardless of its absolute memory address. PIC is widely used in the implementation of shared libraries and dynamic linking because it allows a program to be loaded at any memory address without modification. This flexibility significantly enhances the security and usability of software systems by enabling address space layout randomization (ASLR).

#### Concept of Position-Independent Code:

- PIC avoids hard-coded absolute memory addresses for data and function references. Instead, it uses relative addressing and other indirect techniques to reference data and functions, which makes the executable code flexible and relocatable.
- This approach allows multiple instances of the same program to share a single copy of executable code in memory, reducing overall system memory usage.
- PIC is particularly important for operating systems that support dynamic linking and loading as it simplifies the process and increases the efficiency of these operations.

#### How PIC Works:

The creation of position-independent code involves several key techniques:

- 1. **Relative Addressing:** Instead of using absolute addresses, PIC uses offsets relative to the Program Counter (PC) or a similar register. This method ensures that the code does not depend on being located at a specific address to function correctly.
- 2. Global Offset Table (GOT): A table of addresses that is created to manage references to global variables and functions. The GOT remains at a fixed offset relative to the program's base address, allowing the actual memory addresses to be resolved at runtime.
- 3. **Procedure Linkage Table (PLT):** Used for managing function calls to external functions. The PLT works with the GOT to defer the resolution of function addresses until the functions are actually called (lazy binding).

#### Practical Example

Consider a simple C function that returns the location of a variable:

```
int global_variable = 42;

int get_global_variable() {
   return global_variable;
}
```

In a position-independent version of this code, the reference to 'global\_variable' would be handled through a GOT entry rather than a direct memory address. During execution:

- The address of 'global\_variable' in the GOT is updated dynamically based on where the shared library is loaded.
- The function 'get\_global\_variable' accesses 'global\_variable' using its GOT entry, allowing the function to execute correctly regardless of the actual physical location of the code or the data segment.

#### Advantages and Challenges:

• Advantages:

- Flexibility: Enables code to be reused at multiple memory addresses, reducing memory footprint.
- Security: Enhances security through ASLR, making it more difficult for attackers to predict the location of specific code segments.
- Efficiency: Simplifies the process of loading and linking shared libraries by eliminating the need for additional relocation steps.

## • Challenges:

- Performance Overhead: The indirect addressing mechanisms used in PIC can introduce slight runtime performance penalties compared to non-PIC.
- Complexity in Implementation: Requires more complex link-time and load-time mechanisms to manage relative addressing and symbol resolution.

The last section that is being covered from this chapter this week is **Section 7.13: Library Interpositioning**.

# Section 7.13: Library Interpositioning

# Library Interpositioning

Library interpositioning is an advanced technique used in software engineering to modify the behavior of library functions without altering their code. It allows developers to intercept and potentially alter function calls, results, and behaviors between an application and its libraries. This is particularly useful for debugging, monitoring performance, modifying functionalities, or adding new features to existing binaries dynamically.

#### Concept of Library Interpositioning:

- Library interpositioning involves placing a user-defined library between the application and the actual system libraries. This interposed library contains "wrapper" functions that are named the same as the original library functions they intend to intercept.
- When the application calls a function, the call is first routed to the corresponding function in the interposed library. The interposed function can then choose to directly pass the call along to the original function, modify the input parameters, handle the call itself, or modify the output before returning it to the application.
- This technique is implemented using dynamic linking mechanisms and can be controlled by environment variables or configuration files that specify which libraries to intercept.

## Mechanics of Interpositioning:

Interpositioning is typically achieved through the following steps:

- 1. Creating the Interposed Library: Develop a custom dynamic library that contains the interposed functions. These functions should match the signatures of the original functions they replace.
- 2. Redirecting Function Calls: Use dynamic linker features (such as the LD\_PRELOAD environment variable on Unix-like systems) to load the interposed library prior to other libraries. This ensures that the interposed functions are found first when function calls are resolved.
- 3. **Handling Calls:** In each wrapper function, decide whether to modify inputs or outputs, directly call the original function using function pointers, or perform entirely custom behavior.

# Practical Example

Imagine you want to monitor how often a program calls the 'malloc' function to allocate memory, and what sizes it requests:

```
// file: my_malloc_interposer.c
#include <stdio.h>
#include <stdlib.h>

void* malloc(size_t size) {
    printf("malloc called with size: %zu\n", size);
    void *(*original_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
    return original_malloc(size);
}
```

This custom 'malloc' function will first print the size requested, then call the original 'malloc' function to actually allocate memory. To use this interposed function, compile it into a shared library and set it to preload:

```
gcc -fPIC -shared -o mymalloc.so my_malloc_interposer.c -ldl
export LD_PRELOAD=./mymalloc.so
./your_application
```

The application 'your<sub>a</sub>pplication' will now report every call to 'malloc' as it runs.

## Advantages and Applications:

- **Debugging:** Great for tracking down memory leaks or understanding external library usage patterns without needing to modify the original source code.
- **Performance Monitoring:** Can be used to add logging or performance tracking to critical library functions to help optimize application performance.
- **Security:** Useful for injecting security checks or other features into existing binary applications in a non-invasive way.

# Challenges:

- **Complexity:** Correctly implementing library interposition requires a deep understanding of dynamic linking and the application's library dependencies.
- **Stability:** Misimplementations can lead to application instability or subtle bugs due to changes in expected function behavior.
- **Performance Impact:** While it is a powerful tool, it can introduce overheads and performance penalties, especially if the interposed functions are significantly more complex than the originals.

# Week 15: (4/29 - 5/5)



# Virtual Memory

## 15.0.1 Piazza

Piazza discussions are optional this week. (5/7/24)

### 15.0.2 Lectures

The lecture videos for this week are:

- Virtual Memory Basic Concepts  $\approx 19$  min.
- Virtual Memory Address Translation  $\approx 16$  min.
- Virtual Memory End-to-End Address Translation  $\approx 12$  min.
- Virtual Memory The Core i7/Linux Memory System  $\approx 16$  min.
- Virtual Memory Memory Mapping  $\approx 9$  min.
- Virtual Memory Dynamic Memory Basic Concepts ≈ 14 min.
- Virtual Memory Dynamic Memory Implicit Allocator  $\approx 16$  min.
- Virtual Memory Dynamic Memory Explicit And Segrated Allocators  $\approx 11$  min.
- Virtual Memory Garbage Collection And Memory Perils  $\approx 22$  min.

The lecture notes for this week are:

- Virtual Memory Systems I Lecture Notes
- Virtual Memory Systems II Lecture Notes
- Virtual Memory Systems III Lecture Notes
- Dynamic Memory Allocation Basic Concepts I Lecture Notes
- Dynamic Memory Allocation Basic Concepts II Lecture Notes
- Dynamic Memory Allocation Advanced Concepts I Lecture Notes
- Dynamic Memory Allocation Advanced Concepts II Lecture Notes

## 15.0.3 Assignments

The assignment for this week is:

- Malloc Lab (5/3/24)
- Malloc Lab Extra Credit (5/3/24)

# Week 16: (5/6 - 5/12)



# Final Exam

# 16.0.1 Exam

The exam for this week is:

- Practice Exam 4 (5/6/24)
- Exam 4 Exam 4 Finals (5/6/24)

